Getting Data In

How to parse and extract JSON log files in Splunk?

Engager

I need to parse Tableau 8.2 JSON log files. Sample two rows of the log files is as below:

{"ts":"2014-07-30T07:14:06.840","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"begin-query","v":{"protocol":"c4561a0","query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))"}}
{"ts":"2014-07-30T07:14:06.856","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"end-query","v":{"protocol":"c4561a0","cols":2,"query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))","rows":34,"elapsed":0.011}}

Is there a way to use Splunk to parse this and extract one value? If so, how?

Thank you in advance.

Best Regards,
Namrata Sawant

Tags (3)

Contributor

Bumping this topic again. Why? Because Answers seems to be fairly evenly divided between use INDEXED_EXTRACTIONS and don't.

Here is someone who has actually benchmarked them both:

https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions

So thank you Ryan for laying out the issues in the most detail I've seen so far.
Upside of INDEXED_EXTRACTIONS: improved performance with some scenarios; note he uses the :: operator, not "="
Downside: increased resource usage and perhaps greatly increased disk usage (stands to reason); and, as I discovered, no HF routing for any sourcetype processed this way.

So you need to consider your use case carefully. In the past, and in the absence of this information, I would have gone INDEXED_EXTRACTIONS because it's easy and very reliable. Now I think I would use KV_MODE for json/XML stuff and DELIMS for CSV, unless there was a compelling reason not to.

It would be great if INDEXED_EXTRACTIONS allowed you to blacklist or whitelist fieldnames or patterns to conserve resources, especially disk. Right now it doesn't.

Contributor

I have another piece of information about this. If you want to use routing via a Heavy Forwarder, DON'T use INDEXED_EXTRACTIONS for the time being for any events you want to route. It doesn't work because of the way the pipeline operates. There is an open SPL on this which I am informed is not going to be fixed any time soon. Found out the hard way...
SPL-98594 Routing events to two different groups not working as expected.

0 Karma

SplunkTrust
SplunkTrust

You'll get a real boost of performance when using Splunk 6.1's INDEXED_EXTRACTIONS = json switch.

[your_sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = none

Note, this does change some other behaviour. For example, there's a TIMESTAMP_FIELDS setting that exploits the JSON structure rather than specifying TIME_FORMAT or TIME_PREFIX expressions to manually walk through the structure. See the relevant section in http://docs.splunk.com/Documentation/Splunk/6.1.2/Admin/Propsconf and http://docs.splunk.com/Documentation/Splunk/6.1.2/Data/Extractfieldsfromfileheadersatindextime for more info.

Path Finder

@martin_mueller what's the comparison between your suggestion above, and

[yourSourcetype]
KV_MODE=JSON

on the Search Head?

0 Karma

Path Finder

disregard; I just saw @jmeyers_splunk comment below

0 Karma

Path Finder

Which props.conf needs to be updates since there seems to be lots of them?

0 Karma

Splunk Employee
Splunk Employee

any props.conf will work. Splunk merges all of them together based on rules of precendence (described here: http://docs.splunk.com/Documentation/Splunk/6.1/Admin/Wheretofindtheconfigurationfiles).

typically in a new install where you are trying to get familiar with splunk, you will make your changes in either $SPLUNK_HOME/etc/system/local/props.conf OR $SPLUNK_HOME/etc/apps/search/local/props.conf.

in production, these will often be placed in a purpose built custom apps local or default props.conf file.

0 Karma

Splunk Employee
Splunk Employee

Namrata,
You can also have Splunk extract all these fields automatically during index time using KV_MODE = JSON setting in the props.conf. Give it a shot it is a feature I think of Splunk 6+. For example:

[Tableau_log]
KV_MODE = JSON

It is actually really efficient as Splunk has a built in parser for it.

Explorer

I downvoted this post because kv_mode is used for search-time field extractions only

0 Karma

Explorer

Not exactly sure why that warranted a down vote, as search-time extractions are pretty normal in Splunk and the question didn't call specifically for indexed extractions...furthermore indexed extractions are generally not recommended. Both answers are technically valid.

0 Karma

Explorer

I down voted because he said to use KV_MODE = json during index time.
Can you really use KV_MODE = json during index time?????

0 Karma

Explorer

Ah, you're right! I thought he was saying to do search-time extractions with KV_MODE = json, not index-time. My mistake.

0 Karma

Contributor

Bumping this topic. Is there a specific use case for these two modes of extracting JSON stuff?

And what about the overhead if any?

Documentation is fairly sketchy on this topic.

0 Karma

Splunk Employee
Splunk Employee

KV_MODE = json tells splunk to automatically perform search time extractions on json data
INDEXED_EXTRACTIONS = json tells splunk to create index time extractions for the data

In the first scenario you are saving disk at the expense of processing power (and potentially search time)
In the second scenario you are saving processing power (and potentially search time) at the expense of disk

0 Karma

Engager

Well I am a complete newbie and still trying to figure out where and how I need to use the "spath" command? Do you have any video tutorial that demonstrates this?

Thank you!

0 Karma

Splunk Employee
Splunk Employee

I don't know of a video, but this search should work as a quick demo.

The json you posted does not validate because of the broken quotes around ' Asia '. I fixed that, then escaped the quotes (so that I could inject it via the search bar).

index=_internal | head 1 | eval f = "{\"ts\": \"2014-07-30T07:14:06.840\",\"pid\": 2104,\"tid\": \"1e88\",\"sev\": \"info\",\"req\": \"U9j9rgqgYVgAAA@kHr8AAAHs\",\"sess\": \"B84F19DEC88D4362B9164D87A687CFBC-0:1\",\"site\": \"Default\",\"user\": \"nsawant\",\"k\": \"begin-query\",\"v\": {\"protocol\": \"c4561a0\",\"query\": \"(restrict (aggregate (select (project (table [Extract].[Extract]) (([yr:Date:ok] (year [Date])) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] Asia))(([none: Country / Region: nk][none: Country / Region: nk]))(([avg: P: Population(count): ok](average[P: Population(count)]))))([none: Country / Region: nk][avg: P: Population(count): ok]))\"}}" | fields f | spath input=f | fields - f

SplunkTrust
SplunkTrust
0 Karma