Getting Data In

Extract JSON data within the logs ( JSON mixed with unstructured data)

Super Champion

We got a requirement to extract information from log file. The log file contains JSON data which is the bread-butter for splunk. This is a mixed data whereby the logging application puts some info like logging time| messageSeverity | class | thread etc..

Later, the JSON message starts like - [{ json }].

2013-12-23T14:55:09.574+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=102;_ThreadName=Thread-2;|2013-12-23 14:55:09,574 DEBUG parent-container$child#1-10 [] com.abc.transform.listeners.xyz- [{
  "timestamp" : "2013-12-23T14:55:09.558Z",
  "host" : "myPC",
  "event_id" : "1234",
  "customer_id" : "123456",
...
...

  "country" : "Canada",
  "product" : "iPad",
  "msg" : "Hello Guys",
  "transaction_id" : "100200300400"
  }
}]
|

Please note that this JSON is not fixed, so it can extend to extra lines.
How to extract the JSON data alone into key-value pairs for easy presentation?

Tags (2)
1 Solution

Super Champion

The high level idea to do it automatically within props.conf and transforms.conf is something like below..

To extract JSON keyvalue from mixed data

The 1st Transform will extract "json1" and then subsequently do more transforms

### inputs.conf
[monitor:///var/log/json.log]
sourcetype = myjson

### props.conf
[myjson]
REPORT-json = report-json,report-json-kv

transforms.conf

[report-json]       
# This will get the json payload from the logs. 
# Put your specific logic if you need. Below is a very basic logic baed on { bracket
REGEX = (?P<json1>{.+)

# Manually extract JSON key-value
[report-json-kv]       
REGEX = \"(\w+)\":[\s]*\"([^\,\}\"]+)
FORMAT = $1::$2
MV_ADD = true

View solution in original post

New Member

is it possible to extract a field from a result contained in a JSON field?
Ex; result of field payment.log: {"data":{"lancto_dto_list":,"sld_dt":{"lim":10.00,"sld_disp":37.80,"disp":40.80}}}
I need the field disp.,Is it possible to extract a field from a result contained in a JSON field?
Ex: resultado do campo payment.data: {"data":{"lancto_dto_list":,"sld_dt":{"lim":10.00,"sld_disp":37.80,"disp":40.80}
and I need only the field "disp".
Thanks

0 Karma

Super Champion

The high level idea to do it automatically within props.conf and transforms.conf is something like below..

To extract JSON keyvalue from mixed data

The 1st Transform will extract "json1" and then subsequently do more transforms

### inputs.conf
[monitor:///var/log/json.log]
sourcetype = myjson

### props.conf
[myjson]
REPORT-json = report-json,report-json-kv

transforms.conf

[report-json]       
# This will get the json payload from the logs. 
# Put your specific logic if you need. Below is a very basic logic baed on { bracket
REGEX = (?P<json1>{.+)

# Manually extract JSON key-value
[report-json-kv]       
REGEX = \"(\w+)\":[\s]*\"([^\,\}\"]+)
FORMAT = $1::$2
MV_ADD = true

View solution in original post

Path Finder

Hi KoshyK,

Thanks for the props and transforms. In my case, I need to extract the time stamp from the traditional log as my json doesnt have the actual timestamp value. Your help is much appreciated.

Oct 25 16:59:52 10.234.5.66 1 2018-10-25T16:59:52+01:00 127.0.0.1 Kentik-Detect-Alert - - - {"EventType":"ALARM_STATE_CHANGE","CompanyID":23681,"MitigationID":58694,"AlarmID":18459442,"AlarmState":"CLEAR","PolicyID":5106,"ThresholdID":10
355,"ActivateSeverity":"minor","AlarmStart":"2018-10-25T15:31:47Z","AlarmEnd":"2018-10-25T15:59:51Z","LastActivate":"2018-10-25T15:42:17Z","AlertPolicyName":"SNS_SUBSCRIBER_POOL","AlarmStateOld":"ALARM","AlertKey":[{"DimensionName":"IP_d
st","DimensionValue":"94.12.198.24"},{"DimensionName":"c_abc","DimensionValue":"SNS_SUBSCRIBER_POOL"}],"AlertValue":{"Unit":"bits","Value":4059448300},"AlertBaseline":{"Unit":"bits","Value":0},"AlertValueSecond":{"Unit":"packets/s","Valu
e":402631.44},"AlertBaselineSource":"NO_USE_BASELINE"}

0 Karma

Path Finder

How can this be dont automatically? eval with spath in props.conf only breaks out specific field in the spath path. I'm dealing with hybrid logs like this too.

0 Karma

Super Champion

hi mate, inorder to do this automatically, you need to have "props.conf" and "transforms.conf" and the put the above logic. There are lot of examples in splunk.answers. If not , let me know and I can create an example.cheers

0 Karma

Explorer

I have not found any examples of how to extract nested json automatically in prop.conf/transforms.conf. If you have such examples it would be much appreciated.

0 Karma

Super Champion

hi mate, the accepted answer above will do the exact same thing.
report-json => This will extract pure json message from the mixed message. It should be your logic
report-json-kv => This will extract json (nested) from pure json message

0 Karma

Explorer

Sorry for being unclear, but I need the json extracted with paths to be able to distinguish between top level fields and fields within nested elements, with the same name. spath works excellent for searchtime, but I need it in props.conf/transforms.conf.

0 Karma

Super Champion

oh.ok. But the KV_MODE = json in transforms.conf . and in props.conf assuming your nesting layer1 is nest1

 [report-json-kv]
 FIELDALIAS-result = nest1.* as *

The method-2 of example would do that hopefully. Have a try with a simple json and extend that to nested json

0 Karma

Explorer

I have tried the second method, but it does not work for me. I get an error message about that KV_MODE is an invalid keyword in transforms.conf and I get no extracted fields. As far as I can see from Splunk documentation, it is not supported to use "KV_MODE" in transforms.conf.

Super Champion

my mistake. removed the second method.

0 Karma

SplunkTrust
SplunkTrust

From your event, extract the JSON part to a field and then do spath to process that. For example from your event extracted a filed my_data using rex and then pass it to spath

     2013-12-23T14:55:09.574+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=102;_ThreadName=Thread-2;|2013-12-23 14:55:09,574 DEBUG parent-container$child#1-10 [] com.abc.transform.listeners.xyz- [{   "timestamp" : "2013-12-23T14:55:09.558Z",   "host" : "myPC",   "event_id" : "1234",   "customer_id" : "123456",   "country" : "Canada",   "product" : "iPad",   "msg" : "Hello Guys",   "transaction_id" : "100200300400"   }]

Step by step

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|table my_data

You should be able to see only your JSON string there, if not adjust the regex according to your requirement.

Then parse it with spath

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|table my_data|spath input=my_data

You should be able to see all your fields there.

Now rename/reuse it for further processing. For example

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|spath input=my_data|rename {}.host as MY_HOST,{}.event_id as MY_EVENT|table MY_HOST MY_EVENT

Hope this helps

Explorer

@koshyk Where you ever able to solve this? I would be interested in how if so.

0 Karma

SplunkTrust
SplunkTrust
0 Karma

SplunkTrust
SplunkTrust

You can extract the JSON part into a field and then run spath from that as an input field:

spath [input=<field>] [output=<field>] [path=<datapath> | <datapath>]
0 Karma

Super Champion

I agree if its pure JSON data. But the above entry is a mix of traditional log info + JSON

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!