Splunk Search

How to extract JSON from my sample event data?

Path Finder

Hello,

We are trying to extract the substring (JSON) object from the one of the properties of the log:

{ [-]
Message: EventName="MessageEvent" Message="{"Timestamp":"2016-07-12T23:52:37.8061339+00:00","Level":"Information","MessageTemplate":"Test event processor open. partitionId: {partitionId}, offset: {offset}","Properties":{"EventId":1042,"activityId":"4537fec0-e72f-49cf-97df-754bbcc525b4","loggerMethod":"_open","offset":"5664","partitionId":"1","SourceContext":"TestEventReceiver.TestEventProcessor"}}" TraceSource="WaWorkerHost.exe"
Timestamp: 2016-07-12 23:54:12.543427+00:00
etag: W/"datetime'2016-07-12T23%3A54%3A12.5434277Z'"
}

The purpose of extraction is to be able to index the json properties rather than the string.
The RegEx that I came up for extracting the substring is: Message="(?.*?)" TraceSource=, but not sure how to approach it.

Can you please suggest an approach?

0 Karma

Legend

@vikrant3007, As a general practice, it is not a good idea to add custom field extraction at index time. What are you hoping to achieve by extracting at index time?

Below is from Splunk online documentation.

Caution: Do not add custom fields to the set of default fields that Splunk software automatically extracts and indexes at index time unless absolutely necessary. This includes fields such as timestamp, punct, host, source, and sourcetype. Adding to this list of fields can negatively impact indexing performance and search times, because each indexed field increases the size of the searchable index. Indexed fields are also less flexible--whenever you make changes to your set of fields, you must re-index your entire dataset. For more information, see Index time versus search time in the Managing Indexers and Clusters manual.

But, if you must, here is instructions on how you can achieve that.

http://docs.splunk.com/Documentation/Splunk/6.4.2/Data/Configureindex-timefieldextraction

0 Karma

Path Finder

We understand the issue with adding the custom field but our use case is:
We have structured log (JSON) written into Azure blob storage by azure websites.

For ex:

          {
    "Timestamp": "2016-06-30T20:39:51.6714837-07:00",
    "Level": "Error",
    "MessageTemplate": "Error running test silo worker role.",
    "Exception": "System.Exception: Silo failed to start correctly - aborting\r\n   at Orleans.Runtime.Host.AzureSilo.RunImpl(Nullable`1 cancellationToken)\r\n   at Orleans.Runtime.Host.AzureSilo.Run()\r\n   at TestSilo.WorkerRole.Run() in C:\\src\\upgrade-to-1.0rtm\\test\\TestSilo\\WorkerRole.cs:line 39",
    "Properties": {
        "loggerMethod": "_errorRunningWorkerRole",
        "activityId": "00000000-0000-0000-0000-000000000000",
        "EventId": {
            "Id": 1013
        },
        "SourceContext": "TestSilo.WorkerRole"
    }
}
But Azure websites are adding an overhead to the JSON its own properties:
Message:  EventName="MessageEvent" Message="{"Timestamp":"2016-06-30T20:39:51.6714837-07:00","Level":"Error","MessageTemplate":"Error running test silo worker role.","Exception":"System.Exception: Silo failed to start correctly - aborting\r\n   at Orleans.Runtime.Host.AzureSilo.RunImpl(Nullable`1 cancellationToken)\r\n   at Orleans.Runtime.Host.AzureSilo.Run()\r\n   at TestSilo.WorkerRole.Run() in C:\\src\\upgrade-to-1.0rtm\\test\\TestSilo\\WorkerRole.cs:line 39","Properties":{"loggerMethod":"_errorRunningWorkerRole","activityId":"00000000-0000-0000-0000-000000000000","EventId":{"Id":1013},"SourceContext":"TestSilo.WorkerRole"}}" TraceSource="WaWorkerHost.exe"

Therefore, we want our JSON to be indexed rather than with the other additional properties being added to it.

Please let me know if additional details are required.

0 Karma

Path Finder

Team,

The intention is to extract the JSON during indexing for utilizing the Splunk's indexing capability rather than put it in the SearchHead. Can anyone please suggest an approach for achieving it?

Regards.

0 Karma

Builder

Does it look like XML/JSON in it's raw format? If so |spath or | xmlkv commands might save you some time.

http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Spath
http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/xmlkv

0 Karma

Path Finder

Currently its in string format - as one of the properties of the json object.

0 Karma

Legend

*UPDATED*

Your latest comment makes sense. You are not trying to index the JSON as a custom field, what you are trying to do, it index ONLY the JSON and exclude everything else. I am not familiar with the Azure app, but I would expect there to be a props.conf file on the indexer that has the config rules for indexing data. You will need to edit that to replace all text except relevant JSON with empty string using SEDCMD. Here is a simple sed command you could start with and tweak for your data.

SEDCMD = s/.*(Message=.*)\sTraceSource/$1/g

Here's also a link to online documentation on how this works.

http://docs.splunk.com/Documentation/Splunk/6.4.2/Data/Anonymizedata

0 Karma

Path Finder

Hi Sundaresh,

After setting the transforms.conf, we are able to extract the internal JSON during the search time.

[datapayload-extract]
REGEX = Message=(?.*)\sTrace
FORMAT = DataPayload::$1
DEST_KEY = indexQueue
KV_MODE = json

But the problem is, instead of coming in JSON Format, the DataPayload field is coming as:

 \"{\"Timestamp\":\"2016-06-30T20:38:03.7772659-07:00\",\"Level\":\"Error\",\"MessageTempl....

Can you please suggest how to get the field in JSON format and also if the field will be indexed?

0 Karma

Legend

Hmmm... Remove the KV_MODE=JSON. Not sure what you mean by is the data indexed. All the data is in the index. The field extraction is happening at search time and applies to historical and new data.

0 Karma

Path Finder

I have removed the KV_Mode, but on selecting the field the values are displayed as below:

\"{\"Timestamp\":\"2016-06-30T20:31:06.3461923-07:00\",\"Level\":\"Error\",\"MessageTemplate\":\"Error running test silo worker role.\",\"Exception\":\"System.Exception: Silo failed to start correctly - aborting\\r\\n at Orleans.Runtime.Host.AzureSilo.RunImpl(Nullable`1 cancellationToken)\\r\\n at Orleans.Runtime.Host.AzureSilo.Run()\\r\\n at TestSilo.WorkerRole.Run() in C:\\\\src\\\\upgrade-to-1.0rtm\\\\test\\\\TestSilo\\\\WorkerRole.cs:line 39\",\"Properties\":{\"loggerMethod\":\"_errorRunningWorkerRole\",\"activityId\":\"00000000-0000-0000-0000-000000000000\",\"EventId\":{\"Id\":1013},\"SourceContext\":\"TestSilo.WorkerRole\"}}\"

But our expectation is to see as JSON:
{
    "Timestamp": "2016-06-30T20:31:06.3461923-07:00",
    "Level": "Error",
    "MessageTemplate": "Error running test silo worker role.",
    "Exception": "System.Exception: Silo failed to start correctly - abortingrn at Orleans.Runtime.Host.AzureSilo.RunImpl(Nullable`1 cancellationToken)rn at Orleans.Runtime.Host.AzureSilo.Run()rn at TestSilo.WorkerRole.Run() in C:srcupgrade-to-1.0rtmtestTestSiloWorkerRole.cs:line 39",
    "Properties": {
        "loggerMethod": "_errorRunningWorkerRole",
        "activityId": "00000000-0000-0000-0000-000000000000",
        "EventId": {
            "Id": 1013
        },
        "SourceContext": "TestSilo.WorkerRole"
    }
}

Isn't there a way to ignore the escape character \ and set the format of the extracted field to JSON ?

0 Karma

Path Finder

Hi Sundaresh,

Thanks for the inputs. I have tried to put the following in the /opt/splunk/etc/system/local/props.conf of all the indexers:

[azure:storage:table:message_json]
SEDCMD-datapayload = s/.(Message=.)\sTraceSource/$1/g

And restarted the indexers but still no change. I have also updated it in the /opt/splunk/etc/apps/TA-Azure/local/props.conf too since the add on is added in the apps/TA-Azure location of the search head. But even then it didnt had any impact.
Can you please suggest an alternative approach ?

0 Karma

Legend

This will only affect new data. Existing data will be unchanged.

0 Karma

Path Finder

Team,

Instead of using custom fields, is there a way for us to manipulate what is being read by the Splunk Add-on for Microsoft Azure i.e can we make the Add-on read just the JSON object (highlighted) inside a string and ignore what is around it like Message: EventName="MessageEvent" Message="{"Timestamp":"2016-07-12T23:52:37.8061339+00:00","Level":"Information","MessageTemplate":"Test event processor open. partitionId: {partitionId}, offset: {offset}","Properties":{"EventId":1042,"activityId":"4537fec0-e72f-49cf-97df-754bbcc525b4","loggerMethod":"_open","offset":"5664","partitionId":"1","SourceContext":"TestEventReceiver.TestEventProcessor"}}" TraceSource="WaWorkerHost.exe"

0 Karma

Legend

See my updated answer

0 Karma

Path Finder

I tested the following in RegEx tester and its working: Message="(?.*?)" Trace
But not sure how to approach the whole thing. Can you please throw some light on the approach ?

0 Karma

Legend

If you want to extract this at index time, add this to your props

[unique_stanza_name]
EXTRACT-json_msg = Message=(?<message>.*)\sTrace

http://docs.splunk.com/Documentation/Splunk/6.4.2/Admin/Propsconf

0 Karma

Path Finder

Hi Sundaresh,

I have added the stanza in all 3 indexers but still no luck.
I'm using the Splunk Add-on for Azure to read the logs from blob storage, may I know if that means, I need to add the stanza in the app/TA-Azure/local/props.config in the search head ? I have added there too but still no luck. Can you please suggest any alternatives ?

0 Karma

SplunkTrust
SplunkTrust

The EXTRACT stanza is for search time field extraction and it should be placed in Search Head. Need to restart and ensure that necessary sharing permissions are set.

0 Karma