I need to parse Tableau 8.2 JSON log files. Sample two rows of the log files is as below:
{"ts":"2014-07-30T07:14:06.840","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"begin-query","v":{"protocol":"c4561a0","query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))"}}
{"ts":"2014-07-30T07:14:06.856","pid":2104,"tid":"1e88","sev":"info","req":"U9j9rgqgYVgAAA@kHr8AAAHs","sess":"B84F19DEC88D4362B9164D87A687CFBC-0:1","site":"Default","user":"nsawant","k":"end-query","v":{"protocol":"c4561a0","cols":2,"query":"(restrict (aggregate (select (project (table [Extract].[Extract]) ((yr:Date:ok) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] \"Asia\")) (([none:Country / Region:nk] [none:Country / Region:nk])) ((avg:P: Population (count):ok))) ([none:Country / Region:nk] [avg:P: Population (count):ok]))","rows":34,"elapsed":0.011}}
Is there a way to use Splunk to parse this and extract one value? If so, how?
Thank you in advance.
Best Regards,
Namrata Sawant
Bumping this topic again. Why? Because Answers seems to be fairly evenly divided between use INDEXED_EXTRACTIONS and don't.
Here is someone who has actually benchmarked them both:
https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions
So thank you Ryan for laying out the issues in the most detail I've seen so far.
Upside of INDEXED_EXTRACTIONS: improved performance with some scenarios; note he uses the :: operator, not "="
Downside: increased resource usage and perhaps greatly increased disk usage (stands to reason); and, as I discovered, no HF routing for any sourcetype processed this way.
So you need to consider your use case carefully. In the past, and in the absence of this information, I would have gone INDEXED_EXTRACTIONS because it's easy and very reliable. Now I think I would use KV_MODE for json/XML stuff and DELIMS for CSV, unless there was a compelling reason not to.
It would be great if INDEXED_EXTRACTIONS allowed you to blacklist or whitelist fieldnames or patterns to conserve resources, especially disk. Right now it doesn't.
I have another piece of information about this. If you want to use routing via a Heavy Forwarder, DON'T use INDEXED_EXTRACTIONS for the time being for any events you want to route. It doesn't work because of the way the pipeline operates. There is an open SPL on this which I am informed is not going to be fixed any time soon. Found out the hard way...
SPL-98594 Routing events to two different groups not working as expected.
You'll get a real boost of performance when using Splunk 6.1's INDEXED_EXTRACTIONS = json
switch.
[your_sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = none
Note, this does change some other behaviour. For example, there's a TIMESTAMP_FIELDS
setting that exploits the JSON structure rather than specifying TIME_FORMAT or TIME_PREFIX expressions to manually walk through the structure. See the relevant section in http://docs.splunk.com/Documentation/Splunk/6.1.2/Admin/Propsconf and http://docs.splunk.com/Documentation/Splunk/6.1.2/Data/Extractfieldsfromfileheadersatindextime for more info.
@martin_mueller what's the comparison between your suggestion above, and
[yourSourcetype]
KV_MODE=JSON
on the Search Head?
disregard; I just saw @jmeyers_splunk comment below
Which props.conf needs to be updates since there seems to be lots of them?
any props.conf will work. Splunk merges all of them together based on rules of precendence (described here: http://docs.splunk.com/Documentation/Splunk/6.1/Admin/Wheretofindtheconfigurationfiles).
typically in a new install where you are trying to get familiar with splunk, you will make your changes in either $SPLUNK_HOME/etc/system/local/props.conf OR $SPLUNK_HOME/etc/apps/search/local/props.conf.
in production, these will often be placed in a purpose built custom apps local or default props.conf file.
Namrata,
You can also have Splunk extract all these fields automatically during index time using KV_MODE = JSON setting in the props.conf. Give it a shot it is a feature I think of Splunk 6+. For example:
[Tableau_log]
KV_MODE = JSON
It is actually really efficient as Splunk has a built in parser for it.
I downvoted this post because kv_mode is used for search-time field extractions only
Not exactly sure why that warranted a down vote, as search-time extractions are pretty normal in Splunk and the question didn't call specifically for indexed extractions...furthermore indexed extractions are generally not recommended. Both answers are technically valid.
I down voted because he said to use KV_MODE = json during index time.
Can you really use KV_MODE = json during index time?????
Ah, you're right! I thought he was saying to do search-time extractions with KV_MODE = json, not index-time. My mistake.
Bumping this topic. Is there a specific use case for these two modes of extracting JSON stuff?
And what about the overhead if any?
Documentation is fairly sketchy on this topic.
KV_MODE = json
tells splunk to automatically perform search time extractions on json data
INDEXED_EXTRACTIONS = json
tells splunk to create index time extractions for the data
In the first scenario you are saving disk at the expense of processing power (and potentially search time)
In the second scenario you are saving processing power (and potentially search time) at the expense of disk
Well I am a complete newbie and still trying to figure out where and how I need to use the "spath" command? Do you have any video tutorial that demonstrates this?
Thank you!
I don't know of a video, but this search should work as a quick demo.
The json you posted does not validate because of the broken quotes around ' Asia '. I fixed that, then escaped the quotes (so that I could inject it via the search bar).
index=_internal | head 1 | eval f = "{\"ts\": \"2014-07-30T07:14:06.840\",\"pid\": 2104,\"tid\": \"1e88\",\"sev\": \"info\",\"req\": \"U9j9rgqgYVgAAA@kHr8AAAHs\",\"sess\": \"B84F19DEC88D4362B9164D87A687CFBC-0:1\",\"site\": \"Default\",\"user\": \"nsawant\",\"k\": \"begin-query\",\"v\": {\"protocol\": \"c4561a0\",\"query\": \"(restrict (aggregate (select (project (table [Extract].[Extract]) (([yr:Date:ok] (year [Date])) ([none:Region:nk] [Region]) ([none:Country / Region:nk] [Country / Region]))) (= [none:Region:nk] Asia))(([none: Country / Region: nk][none: Country / Region: nk]))(([avg: P: Population(count): ok](average[P: Population(count)]))))([none: Country / Region: nk][avg: P: Population(count): ok]))\"}}" | fields f | spath input=f | fields - f
Did you try spath command?
http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/Spath