- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to parse and extract JSON log files in Splunk?
I need to parse Tableau 8.2 JSON log files. Sample two rows of the log files is as below:
{"ts":"2014-07-30T07:14:06.840","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"begin-query","v":{"protocol":"c4561a0","query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))"}}
{"ts":"2014-07-30T07:14:06.856","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"end-query","v":{"protocol":"c4561a0","cols":2,"query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))","rows":34,"elapsed":0.011}}
Is there a way to use Splunk to parse this and extract one value? If so, how?
Thank you in advance.
Best Regards,
Namrata Sawant
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bumping this topic again. Why? Because Answers seems to be fairly evenly divided between use INDEXED_EXTRACTIONS and don't.
Here is someone who has actually benchmarked them both:
https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions
So thank you Ryan for laying out the issues in the most detail I've seen so far.
Upside of INDEXED_EXTRACTIONS: improved performance with some scenarios; note he uses the :: operator, not "="
Downside: increased resource usage and perhaps greatly increased disk usage (stands to reason); and, as I discovered, no HF routing for any sourcetype processed this way.
So you need to consider your use case carefully. In the past, and in the absence of this information, I would have gone INDEXED_EXTRACTIONS because it's easy and very reliable. Now I think I would use KV_MODE for json/XML stuff and DELIMS for CSV, unless there was a compelling reason not to.
It would be great if INDEXED_EXTRACTIONS allowed you to blacklist or whitelist fieldnames or patterns to conserve resources, especially disk. Right now it doesn't.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have another piece of information about this. If you want to use routing via a Heavy Forwarder, DON'T use INDEXED_EXTRACTIONS for the time being for any events you want to route. It doesn't work because of the way the pipeline operates. There is an open SPL on this which I am informed is not going to be fixed any time soon. Found out the hard way...
SPL-98594 Routing events to two different groups not working as expected.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You'll get a real boost of performance when using Splunk 6.1's INDEXED_EXTRACTIONS = json
switch.
[your_sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = none
Note, this does change some other behaviour. For example, there's a TIMESTAMP_FIELDS
setting that exploits the JSON structure rather than specifying TIME_FORMAT or TIME_PREFIX expressions to manually walk through the structure. See the relevant section in http://docs.splunk.com/Documentation/Splunk/6.1.2/Admin/Propsconf and http://docs.splunk.com/Documentation/Splunk/6.1.2/Data/Extractfieldsfromfileheadersatindextime for more info.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@martin_mueller what's the comparison between your suggestion above, and
[yourSourcetype]
KV_MODE=JSON
on the Search Head?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
disregard; I just saw @jmeyers_splunk comment below
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which props.conf needs to be updates since there seems to be lots of them?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

any props.conf will work. Splunk merges all of them together based on rules of precendence (described here: http://docs.splunk.com/Documentation/Splunk/6.1/Admin/Wheretofindtheconfigurationfiles).
typically in a new install where you are trying to get familiar with splunk, you will make your changes in either $SPLUNK_HOME/etc/system/local/props.conf OR $SPLUNK_HOME/etc/apps/search/local/props.conf.
in production, these will often be placed in a purpose built custom apps local or default props.conf file.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Namrata,
You can also have Splunk extract all these fields automatically during index time using KV_MODE = JSON setting in the props.conf. Give it a shot it is a feature I think of Splunk 6+. For example:
[Tableau_log]
KV_MODE = JSON
It is actually really efficient as Splunk has a built in parser for it.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I downvoted this post because kv_mode is used for search-time field extractions only
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not exactly sure why that warranted a down vote, as search-time extractions are pretty normal in Splunk and the question didn't call specifically for indexed extractions...furthermore indexed extractions are generally not recommended. Both answers are technically valid.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I down voted because he said to use KV_MODE = json during index time.
Can you really use KV_MODE = json during index time?????
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, you're right! I thought he was saying to do search-time extractions with KV_MODE = json, not index-time. My mistake.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bumping this topic. Is there a specific use case for these two modes of extracting JSON stuff?
And what about the overhead if any?
Documentation is fairly sketchy on this topic.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

KV_MODE = json
tells splunk to automatically perform search time extractions on json data
INDEXED_EXTRACTIONS = json
tells splunk to create index time extractions for the data
In the first scenario you are saving disk at the expense of processing power (and potentially search time)
In the second scenario you are saving processing power (and potentially search time) at the expense of disk
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well I am a complete newbie and still trying to figure out where and how I need to use the "spath" command? Do you have any video tutorial that demonstrates this?
Thank you!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I don't know of a video, but this search should work as a quick demo.
The json you posted does not validate because of the broken quotes around ' Asia '. I fixed that, then escaped the quotes (so that I could inject it via the search bar).
index=_internal | head 1 | eval f = "{\"ts\": \"2014-07-30T07:14:06.840\",\"pid\": 2104,\"tid\": \"1e88\",\"sev\": \"info\",\"req\": \"U9j9rgqgYVgAAA@kHr8AAAHs\",\"sess\": \"B84F19DEC88D4362B9164D87A687CFBC-0:1\",\"site\": \"Default\",\"user\": \"nsawant\",\"k\": \"begin-query\",\"v\": {\"protocol\": \"c4561a0\",\"query\": \"(restrict (aggregate (select (project (table [Extract].[Extract]) (([yr:Date:ok] (year [Date])) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] Asia))(([none: Country / Region: nk][none: Country / Region: nk]))(([avg: P: Population(count): ok](average[P: Population(count)]))))([none: Country / Region: nk][avg: P: Population(count): ok]))\"}}" | fields f | spath input=f | fields - f
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Did you try spath command?
http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/Spath
