Hello,
We have json data coming in Splunk and to extract that we have given
| rex "(?<json>\{.*\})"
| spath input=json
Now my ask is I want this query to be run by default for one or more sourcetypes, without everytime giving in search query.
Do I need to do it while on boarding itself only? If yes please help me with step by step procedure. We don't have HF. We have deployment server, manager, and 3 indexers. DS will push apps to manager and from there manager will push apps to peers.
yes it can. and no it wont. because you wont be extracting fields at index time if you dont use indexed_extractions=json.
Splunk is very good at applying only what config matters. So when in doubt send them to both idx and sh. Splunk usually just figures it out.
The duplicate extractions issues happens when you do BOTH index time (indexed_extractions=json) AND Search time (kv_mode=json) in your props.conf config. Thats when they may collide, and is why i say i ALMOST never enable indexed_extractions=json as I would always prefer review of search time extract then only move key fields i need to index time for performance reasons.
Hi Karthikeya!
Are you parsing JSON out of a non-JSON payload? what would a sample event look like, are they not JSON to begin with?
Do you need the rest of the event in splunk? or just the JSON part?
The short answer is once you prove your extraction works for all your events in search, then you can move the regex parsing to the "props and transforms" configuration so you dont need to run it every time someone searches that sourcetype.
It is not possible to give you every step as it depends on your data and outcomes and environment, but from what you simply shared see this documentation - https://docs.splunk.com/Documentation/SplunkCloud/9.3.2408/Knowledge/Createandmaintainsearch-timefie...
And can I try giving kv_mode = JSON just to check my luck? What will be the consequences if it don't work? Please guide me through steps....
Unfortunately, at this moment Splunk can only do automatic structured data extraction if the whole event is well-formed structured data. So if your whole event is a json blob - Splunk can interpret it automatically. If it isn't because it contains some header or footer, it's a no-go.
There is an open idea about this on ideas.splunk.com - https://ideas.splunk.com/ideas/EID-I-208
Feel free to upvote it.
For now all you can do is to trim your original event to contain only the json part. (But then you might lose some data, I know).
Hi @PickleRick ,
It's a structured json query we have and it is not extracting field values automatically. Everytime we need to give command in search which is not the customer wants. They want this extraction to be default.
I know it's a json. But is it the whole event? Or does the event have additional pieces? So does the event look like this:
{ "a":"b", "c":"d" }
or more like this
<12>Nov 12 20:15:12 localhost whatever: data={"a":"b","c":"d"}
and you only want the json part parsed?
In the former case, it's enough to set KV_MODE to json (but KV_MODE=json doesn't handle multilevel field names). If it's the latter - that's the situation I described - Splunk cannot handle the structured _part_ automatically.
Yes it's the latter case. But search query I mentioned above (spath) is working perfectly. Is there any way I can achieve this? If this is not possible, can I make macro of that query and use it in search query ? I don't know how customer feels to it.
Where I need to set kv_mode = json?
spath works because you're extracting just the json part with the rex command and only apply spath on that json field.
Yes, you can create a macro but you will still need to manually execute that macro in your search.
Setting KV_MODE to json will not hurt (maybe except for some minor performance hit) but will not help.
No where I need to specify this? What this query will do? Please explain
Hi @Karthikeya ,
you have to add this option to the stanza in props.conf where your sourcetype is defined.
Then you have to add this props.conf to the add-on containing the inputs.conf and to the Search Head.
Ciao.
Giuseppe
@mattymo this is how my splunk events looks like:
<12>Nov 12 20:15:12 localhost whatever: data={"a":"b","c":"d"} the rest are json fields..
As of now we are giving spath command in search which is not acceptable by customer. They want this json data fields to be extracted automatically once the on-boarding is done.
Can I give indexed_extractions=json or kv_mode=json to achieve this? I am not sure where to give these settings?
Iif i can achieve my requirement through this, please guide me through the steps atleast.
No you cannot use indexed_extractions or kv_mode as the event is not json, only a part of the event is.
The way I have gone about this, is to extract the json bit automatically using props/transforms, so the json bit ends up in its own field, then can be worked on.
Otherwise I would look at if you really need to extract all the json, or just extract known important field values by extracting their key value pairs with regex, or even look at using ingest eval to extract the non json bits to fields, then dump them, then only keep the json..but it really all depends on the end user and their requirements/use case
Either way this is a custom data onboarding...will require some work to get your use case done..
I usually ask why the sender is not using properly formatted json events to begin with...or if they can just insert kv pairs like "foo=bar" instead of the useless json blob thrown into a nonstructured event ...shoving json into a non json event is not really the flex people think it is....but hey thats what devs do these days...either way cleaning up log formats can be hard so may have to just find a way that works for this end user.
I know the pain I have to deal with this in OTel Collector events like this:
2025-01-09T20:29:14.015Z info internal/retry_sender.go:126 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "Post \"https://http-inputs.foo.splunkcloud.com/services/collector\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)", "interval": "5.105973834s"}
AFAIK there is no way you can deal with this so that a user doesnt have to spath the field, unless you hide it in a dashboard or fundamentally change the format of the event thats indexed.
Hi @mattymo ,
Thanks for your detailed explanation.
What format would be good to get json data extracted automatically onto Splunk? I can suggest the sender to follow that format if possible. And is there will be any problem if they remove that unwanted matter like date time??
And they want all json fields values to be extracted not specific and it would be difficult to write regex for all of them.
If they want to parse JSON automatically, the sender agent/mechanism must send full formed JSON events.
Review the event with them...its not JSON. its JSON in an unstructured log line. In fact, this looks like some json thru syslog adventure. yum!
<12>Nov 12 20:15:12 localhost whatever: data={"a":"b","c":"d"}
The easiest way in syslog is to send kvpairs in the log events instead of json, like foo=bar bar=baz,
<12>Nov 12 20:15:12 localhost whatever: a=b c=d
then splunk can just pick out all the kv pairs automagically, instead of having to parse json to do the same thing. Many apps have this option in their logger. might get lucky. JSON provides no value here if we have to live with whatever pipeline is sending this syslog filled with json stuff.
If the app cant change its format, or the ingestion path cant be reviewed, then the next option is surgery on the inbound event, where Splunk config is used to parse out the syslog facility, timestamp (which doesnt even have the year or precision timestamp) the host into indexed fields, then remove this part of the event:
<12>Nov 12 20:15:12 localhost whatever: data=
so all thats left when splunk indexes the _raw event is:
{"a":"b","c":"d"}
Which will allow kv_mode=json to do its thing. you never should go straight to indexed_extractions=json.
See this awesome conf talk on the power of splunk ingest_eval https://conf.splunk.com/files/2020/slides/PLA1154C.pdf
Then these examples on github from the con
talk
https://github.com/silkyrich/ingest_eval_examples/blob/master/default/props.conf
https://github.com/silkyrich/ingest_eval_examples/blob/master/default/transforms.conf
or look into splunk edge processor or ingest processor if you are a cloud customer.
Options after that, are reviewing the ingestion process and move away from syslog to more modern collection to get better data like iso timestamps with timezone, etc. but whatever you use, still needs to be able to format the event properly if you want the benefit of structured data format.
I strongly suggest you consult with your Splunk Sales Engineer on the customer's account so that an expert or partner can help them achieve this and you can learn by working with them.
Is this a onprem enterprise user? or Cloud user?
It is splunk enterprise not cloud.
Which will allow kv_mode=json to do its thing. you never should go straight to indexed_extractions=json.
Where I need to give kv_mode?
We have syslog servers where uf installed and we have DS which pushes apps to Deployer and manager. From there it will push to peer nodes and search heads.
Where I can exactly this attribute (kv_mode=json)? We have props and transforms in manager apps where it will be pushed to all peers. I don't see any props in search heads.
kv_mode=json would be in the sourcetype on the Search Heads.
Ingest_Eval will be props/transforms on indexers.
Technically you can just put all the configs everywhere and splunk will sort it out.
Can I put kv_mode = json in already existing props.conf in manager node then it will push to peer nodes? But you said it should be in search heads? Should I create new app in Deployer and in locals hould I place props.conf (here I will keep kv_mode = json) and then deploy it to search heads?
Sorry I am asking so many questions literally I am confused here...
Simplest way to put it...create a single app with all your sourcetype configs in it, then distribute that app using the appropriate mechanism for 1. indexers (manager node) 2. Search heads (deployer for SHC or DS/Directly, if standalone)
Ok here my doubt is... Can one app which contains props.conf (with kv_mode=json) be distributed to both indexers and search heads? Because will it may lead to duplication of fields or events by any chance? Index time and search time extraction I am asking about. Is it ok?