Getting Data In

Extract JSON data within the logs ( JSON mixed with unstructured data)

koshyk
Super Champion

We got a requirement to extract information from log file. The log file contains JSON data which is the bread-butter for splunk. This is a mixed data whereby the logging application puts some info like logging time| messageSeverity | class | thread etc..

Later, the JSON message starts like - [{ json }].

2013-12-23T14:55:09.574+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=102;_ThreadName=Thread-2;|2013-12-23 14:55:09,574 DEBUG parent-container$child#1-10 [] com.abc.transform.listeners.xyz- [{
  "timestamp" : "2013-12-23T14:55:09.558Z",
  "host" : "myPC",
  "event_id" : "1234",
  "customer_id" : "123456",
...
...

  "country" : "Canada",
  "product" : "iPad",
  "msg" : "Hello Guys",
  "transaction_id" : "100200300400"
  }
}]
|

Please note that this JSON is not fixed, so it can extend to extra lines.
How to extract the JSON data alone into key-value pairs for easy presentation?

Tags (2)
1 Solution

koshyk
Super Champion

The high level idea to do it automatically within props.conf and transforms.conf is something like below..

To extract JSON keyvalue from mixed data

The 1st Transform will extract "json1" and then subsequently do more transforms

### inputs.conf
[monitor:///var/log/json.log]
sourcetype = myjson

### props.conf
[myjson]
REPORT-json = report-json,report-json-kv

transforms.conf

[report-json]       
# This will get the json payload from the logs. 
# Put your specific logic if you need. Below is a very basic logic baed on { bracket
REGEX = (?P<json1>{.+)

# Manually extract JSON key-value
[report-json-kv]       
REGEX = \"(\w+)\":[\s]*\"([^\,\}\"]+)
FORMAT = $1::$2
MV_ADD = true

View solution in original post

chadmedeiros
Path Finder

none of these answers solves the root problem -- particularly in scenarios where the text log contains a json blob with nested json structure. It is quite confusing that spath command is not available in props.conf -- using spath in search is not an option for things like SIEMs where parsing needs to be done 'automatically' in order to fit into given data models or to map for CIM compliance.

0 Karma

anishinha
New Member

is it possible to extract a field from a result contained in a JSON field?
Ex; result of field payment.log: {"data":{"lancto_dto_list":,"sld_dt":{"lim":10.00,"sld_disp":37.80,"disp":40.80}}}
I need the field disp.,Is it possible to extract a field from a result contained in a JSON field?
Ex: resultado do campo payment.data: {"data":{"lancto_dto_list":,"sld_dt":{"lim":10.00,"sld_disp":37.80,"disp":40.80}
and I need only the field "disp".
Thanks

0 Karma

koshyk
Super Champion

The high level idea to do it automatically within props.conf and transforms.conf is something like below..

To extract JSON keyvalue from mixed data

The 1st Transform will extract "json1" and then subsequently do more transforms

### inputs.conf
[monitor:///var/log/json.log]
sourcetype = myjson

### props.conf
[myjson]
REPORT-json = report-json,report-json-kv

transforms.conf

[report-json]       
# This will get the json payload from the logs. 
# Put your specific logic if you need. Below is a very basic logic baed on { bracket
REGEX = (?P<json1>{.+)

# Manually extract JSON key-value
[report-json-kv]       
REGEX = \"(\w+)\":[\s]*\"([^\,\}\"]+)
FORMAT = $1::$2
MV_ADD = true

cyber_castle
Path Finder

Hi KoshyK,

Thanks for the props and transforms. In my case, I need to extract the time stamp from the traditional log as my json doesnt have the actual timestamp value. Your help is much appreciated.

Oct 25 16:59:52 10.234.5.66 1 2018-10-25T16:59:52+01:00 127.0.0.1 Kentik-Detect-Alert - - - {"EventType":"ALARM_STATE_CHANGE","CompanyID":23681,"MitigationID":58694,"AlarmID":18459442,"AlarmState":"CLEAR","PolicyID":5106,"ThresholdID":10
355,"ActivateSeverity":"minor","AlarmStart":"2018-10-25T15:31:47Z","AlarmEnd":"2018-10-25T15:59:51Z","LastActivate":"2018-10-25T15:42:17Z","AlertPolicyName":"SNS_SUBSCRIBER_POOL","AlarmStateOld":"ALARM","AlertKey":[{"DimensionName":"IP_d
st","DimensionValue":"94.12.198.24"},{"DimensionName":"c_abc","DimensionValue":"SNS_SUBSCRIBER_POOL"}],"AlertValue":{"Unit":"bits","Value":4059448300},"AlertBaseline":{"Unit":"bits","Value":0},"AlertValueSecond":{"Unit":"packets/s","Valu
e":402631.44},"AlertBaselineSource":"NO_USE_BASELINE"}

0 Karma

cesaccenturefed
Path Finder

How can this be dont automatically? eval with spath in props.conf only breaks out specific field in the spath path. I'm dealing with hybrid logs like this too.

0 Karma

koshyk
Super Champion

hi mate, inorder to do this automatically, you need to have "props.conf" and "transforms.conf" and the put the above logic. There are lot of examples in splunk.answers. If not , let me know and I can create an example.cheers

0 Karma

emiliebarse2
Explorer

I have not found any examples of how to extract nested json automatically in prop.conf/transforms.conf. If you have such examples it would be much appreciated.

0 Karma

koshyk
Super Champion

hi mate, the accepted answer above will do the exact same thing.
report-json => This will extract pure json message from the mixed message. It should be your logic
report-json-kv => This will extract json (nested) from pure json message

0 Karma

emiliebarse2
Explorer

Sorry for being unclear, but I need the json extracted with paths to be able to distinguish between top level fields and fields within nested elements, with the same name. spath works excellent for searchtime, but I need it in props.conf/transforms.conf.

0 Karma

koshyk
Super Champion

oh.ok. But the KV_MODE = json in transforms.conf . and in props.conf assuming your nesting layer1 is nest1

 [report-json-kv]
 FIELDALIAS-result = nest1.* as *

The method-2 of example would do that hopefully. Have a try with a simple json and extend that to nested json

0 Karma

emiliebarse2
Explorer

I have tried the second method, but it does not work for me. I get an error message about that KV_MODE is an invalid keyword in transforms.conf and I get no extracted fields. As far as I can see from Splunk documentation, it is not supported to use "KV_MODE" in transforms.conf.

koshyk
Super Champion

my mistake. removed the second method.

0 Karma

renjith_nair
SplunkTrust
SplunkTrust

From your event, extract the JSON part to a field and then do spath to process that. For example from your event extracted a filed my_data using rex and then pass it to spath

     2013-12-23T14:55:09.574+0000|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=102;_ThreadName=Thread-2;|2013-12-23 14:55:09,574 DEBUG parent-container$child#1-10 [] com.abc.transform.listeners.xyz- [{   "timestamp" : "2013-12-23T14:55:09.558Z",   "host" : "myPC",   "event_id" : "1234",   "customer_id" : "123456",   "country" : "Canada",   "product" : "iPad",   "msg" : "Hello Guys",   "transaction_id" : "100200300400"   }]

Step by step

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|table my_data

You should be able to see only your JSON string there, if not adjust the regex according to your requirement.

Then parse it with spath

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|table my_data|spath input=my_data

You should be able to see all your fields there.

Now rename/reuse it for further processing. For example

index=* sourcetype=json_data|rex "^(?:[^ \n]* ){7}(?P<my_data>.+)"|spath input=my_data|rename {}.host as MY_HOST,{}.event_id as MY_EVENT|table MY_HOST MY_EVENT

Hope this helps

Happy Splunking!

prees
Explorer

@koshyk Where you ever able to solve this? I would be interested in how if so.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can extract the JSON part into a field and then run spath from that as an input field:

spath [input=<field>] [output=<field>] [path=<datapath> | <datapath>]
0 Karma

koshyk
Super Champion

I agree if its pure JSON data. But the above entry is a mix of traditional log info + JSON

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

March Community Office Hours Security Series Uncovered!

Hello Splunk Community! In March, Splunk Community Office Hours spotlighted our fabulous Splunk Threat ...

Stay Connected: Your Guide to April Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars in April. This post ...