- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Remove Non-json data to auto-extract json fields
Hello all,
Currently we have following event which contains both json and non json data. Please help me in removing this non-json part and where I need to give indexed_extractuons or KV_mode effectively to auto extract all json fields.
Nov 9 17:34:28 128.160.82.28 [local0.warning] <132>1 2024-11-09T17:34:28.436542Z AviVantage v-epswafhic2-wdc.hc.cloud.uk.hc-443 NILVALUE NILVALUE - {"adf":true,"significant":0,"udf":false,"virtualservice":"virtualservice-4583863f-48a3-42b9-8115-252a7fb487f5","report_timestamp":"2024-11-09T17:34:28.436542Z","service_engine":"GB-DRN-AB-Tier2-se-vxeuz","vcpu_id":0,"log_id":10181,"client_ip":"128.12.73.92","client_src_port":44908,"client_dest_port":443,"client_rtt":1,"http_version":"1.1","method":"HEAD","uri_path":"/path/to/monitor/page/","host":"udg1704n01.hc.cloud.uk.hc","response_content_type":"text/html","request_length":93,"response_length":94,"response_code":400,"response_time_first_byte":1,"response_time_last_byte":1,"compression_percentage":0,"compression":"","client_insights":"","request_headers":3,"response_headers":12,"request_state":"AVI_HTTP_REQUEST_STATE_READ_CLIENT_REQ_HDR","significant_log":["ADF_HTTP_BAD_REQUEST_PLAIN_HTTP_REQUEST_SENT_ON_HTTPS_PORT","ADF_RESPONSE_CODE_4XX"],"vs_ip":"128.160.71.14","request_id":"61e-RDl6-OZgZ","max_ingress_latency_fe":0,"avg_ingress_latency_fe":0,"conn_est_time_fe":1,"source_ip":"128.12.73.92","vs_name":"v-epswafhic2-wdc.hc.cloud.uk.hc-443","tenant_name":"admin"}
And where I need to give these configurations?
We have syslog servers with UF installed and that send data to our deployment server. DS will push apps to master and deployer from there pushing will be done.
As of now we have props.conf in master which will push to indexers.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I have given the below stanza in props.conf and pushed to indexers. Fields are being extracted in json but logs are getting duplicated. Please help me.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Verify in splunkd.log whether your Universal Forwarder (UF) or Heavy Forwarder (HF) is sending duplicate events.
Check inputs.conf, make sure crcSalt = <SOURCE> is set to avoid duplicate ingestion.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please check this solution.
Solved: Re: Why would INDEXED_EXTRACTIONS=JSON in props.co... - Splunk Community
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kiran_panchavat can you please guide me where to add your stanza? Indexers or Search heads??
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@splunklearner Yes, KV_MODE is for search time field extractions.
KV_MODE = [none|auto|auto_escaped|multi|multi:<multikv.conf_stanza_name>|json|xml] * Used for search-time field extractions only. * Specifies the field/value extraction mode for the data. * Set KV_MODE to one of the following: * none - Disables field extraction for the host, source, or source type. * auto_escaped - Extracts fields/value pairs separated by equal signs and honors \" and \\ as escaped sequences within quoted values. For example: field="value with \"nested\" quotes" * multi - Invokes the 'multikv' search command, which extracts fields from table-formatted events. * multi:<multikv.conf_stanza_name> - Invokes a custom multikv.conf configuration to extract fields from a specific type of table-formatted event. Use this option in situations where the default behavior of the 'multikv' search command is not meeting your needs. * xml - Automatically extracts fields from XML data. * json - Automatically extracts fields from JSON data. * Setting to 'none' can ensure that one or more custom field extractions are not overridden by automatic field/value extraction for a particular host, source, or source type. You can also use 'none' to increase search performance by disabling extraction for common but nonessential fields. * The 'xml' and 'json' modes do not extract any fields when used on data that isn't of the correct format (JSON or XML). * If you set 'KV_MODE = json' for a source type, do not also set 'INDEXED_EXTRACTIONS = JSON' for the same source type. This causes the Splunk software to extract the json fields twice: once at index time and again at search time. * When KV_MODE is set to 'auto' or 'auto_escaped', automatic JSON field extraction can take place alongside other automatic field/value extractions. To disable JSON field extraction when 'KV_MODE' is set to 'auto' or 'auto_escaped', add 'AUTO_KV_JSON = false' to the stanza. * Default: auto
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So should I give the following stanza in Deployer or cluster manager?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @splunklearner
To have this processed at ingest time you can do a simple INGEST_EVAL on your indexers.
== props.conf ==
[yourStanzaName]
TRANSFORMS = stripNonJSON
== transforms.conf ==
[stripNonJSON]
INGEST_EVAL = _raw:=replace(_raw, ".*- ({.*})", "\1")
Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards
Will
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@splunklearner
If you go down the ingest time approach then you will add the props/transforms.conf within an app in your manager-apps folder on your Cluster Manager and then push out to your indexers.
No changes should be required for your searchheads if you go down that route, but feel free to evaluate the alternatives provided in this post too.
I hope this helps.
Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards
Will
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @livehybrid ,
I heard that search time extractions are more better than index time due to performance issues? Is it so? Please clear fy
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @splunklearner ,
I guess the answer really is "it depends" however in this scenario we are overwriting the original data with just the JSON, rather than adding an additional extracted field.
Search time field extractions/eval/changes are executed every time you search the data, and in some cases need to be evaluated before the search is filtered down. For example if you search for "uri=/test" then you may find that at search time it needs to process all events to determine the uri field for each event, before it can then filter down. Being able to search against the URI without having to do any modification to every event means it should be faster.
The disadvantage of index-time extractions is that it doesnt apply retrospectively to data you already have, whereas search time will apply to everything currently indexed.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@splunklearner I have standalone server, so you can try this settings on your heavy forwarder or indexers.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have access to UI. I need to do it from backend only. Where I can give this props.conf? In cluster master or deployer? Is it index time extraction or search time?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@splunklearner I tried this using your sample data; please have a look.
[syslogtest]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
category=Custom
pulldown_type=true
SEDCMD-removeheader=s/^[^\{]*//g
KV_MODE=json
AUTO_KV_JSON=true
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kiran_panchavat ,
Thanks for the answer.
But I read that kv_mode = json needs to be given on search time extraction i.e on search heads... But you are saying to give this on indexers or heavy forwarders... Will it help.. please clarify?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kiran_panchavat ,
This is present in my current props.conf which is there is Cluster Manager for this sourcetype (which is copied from other sourcetype)--
[sony_waf]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
SEDCMD-newline_remove = s/\\r\\n/\n/g
SEDCMD-formatxml =s/></>\n</g
LINE_BREAKER = ([\r\n]+)[A-Z][a-z]
{2}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s
SHOULD_LINEMERGE = False
TRUNCATE = 10000
Now do I need to add here in this props.conf and push it to indexers? Or create new props.conf in Deployer which includes your props.conf stanza and push it to search heads?
