Hello all,
Currently we have following event which contains both json and non json data. Please help me in removing this non-json part and where I need to give indexed_extractuons or KV_mode effectively to auto extract all json fields.
Nov 9 17:34:28 128.160.82.28 [local0.warning] <132>1 2024-11-09T17:34:28.436542Z AviVantage v-epswafhic2-wdc.hc.cloud.uk.hc-443 NILVALUE NILVALUE - {"adf":true,"significant":0,"udf":false,"virtualservice":"virtualservice-4583863f-48a3-42b9-8115-252a7fb487f5","report_timestamp":"2024-11-09T17:34:28.436542Z","service_engine":"GB-DRN-AB-Tier2-se-vxeuz","vcpu_id":0,"log_id":10181,"client_ip":"128.12.73.92","client_src_port":44908,"client_dest_port":443,"client_rtt":1,"http_version":"1.1","method":"HEAD","uri_path":"/path/to/monitor/page/","host":"udg1704n01.hc.cloud.uk.hc","response_content_type":"text/html","request_length":93,"response_length":94,"response_code":400,"response_time_first_byte":1,"response_time_last_byte":1,"compression_percentage":0,"compression":"","client_insights":"","request_headers":3,"response_headers":12,"request_state":"AVI_HTTP_REQUEST_STATE_READ_CLIENT_REQ_HDR","significant_log":["ADF_HTTP_BAD_REQUEST_PLAIN_HTTP_REQUEST_SENT_ON_HTTPS_PORT","ADF_RESPONSE_CODE_4XX"],"vs_ip":"128.160.71.14","request_id":"61e-RDl6-OZgZ","max_ingress_latency_fe":0,"avg_ingress_latency_fe":0,"conn_est_time_fe":1,"source_ip":"128.12.73.92","vs_name":"v-epswafhic2-wdc.hc.cloud.uk.hc-443","tenant_name":"admin"}
And where I need to give these configurations?
We have syslog servers with UF installed and that send data to our deployment server. DS will push apps to master and deployer from there pushing will be done.
As of now we have props.conf in master which will push to indexers.
Hi all, I have given the below stanza in props.conf and pushed to indexers. Fields are being extracted in json but logs are getting duplicated. Please help me.
Verify in splunkd.log whether your Universal Forwarder (UF) or Heavy Forwarder (HF) is sending duplicate events.
Check inputs.conf, make sure crcSalt = <SOURCE> is set to avoid duplicate ingestion.
Please check this solution.
Solved: Re: Why would INDEXED_EXTRACTIONS=JSON in props.co... - Splunk Community
Hi @kiran_panchavat can you please guide me where to add your stanza? Indexers or Search heads??
@splunklearner Yes, KV_MODE is for search time field extractions.
KV_MODE = [none|auto|auto_escaped|multi|multi:<multikv.conf_stanza_name>|json|xml] * Used for search-time field extractions only. * Specifies the field/value extraction mode for the data. * Set KV_MODE to one of the following: * none - Disables field extraction for the host, source, or source type. * auto_escaped - Extracts fields/value pairs separated by equal signs and honors \" and \\ as escaped sequences within quoted values. For example: field="value with \"nested\" quotes" * multi - Invokes the 'multikv' search command, which extracts fields from table-formatted events. * multi:<multikv.conf_stanza_name> - Invokes a custom multikv.conf configuration to extract fields from a specific type of table-formatted event. Use this option in situations where the default behavior of the 'multikv' search command is not meeting your needs. * xml - Automatically extracts fields from XML data. * json - Automatically extracts fields from JSON data. * Setting to 'none' can ensure that one or more custom field extractions are not overridden by automatic field/value extraction for a particular host, source, or source type. You can also use 'none' to increase search performance by disabling extraction for common but nonessential fields. * The 'xml' and 'json' modes do not extract any fields when used on data that isn't of the correct format (JSON or XML). * If you set 'KV_MODE = json' for a source type, do not also set 'INDEXED_EXTRACTIONS = JSON' for the same source type. This causes the Splunk software to extract the json fields twice: once at index time and again at search time. * When KV_MODE is set to 'auto' or 'auto_escaped', automatic JSON field extraction can take place alongside other automatic field/value extractions. To disable JSON field extraction when 'KV_MODE' is set to 'auto' or 'auto_escaped', add 'AUTO_KV_JSON = false' to the stanza. * Default: auto
So should I give the following stanza in Deployer or cluster manager?
Hi @splunklearner
To have this processed at ingest time you can do a simple INGEST_EVAL on your indexers.
== props.conf ==
[yourStanzaName]
TRANSFORMS = stripNonJSON
== transforms.conf ==
[stripNonJSON]
INGEST_EVAL = _raw:=replace(_raw, ".*- ({.*})", "\1")
Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards
Will
@splunklearner
If you go down the ingest time approach then you will add the props/transforms.conf within an app in your manager-apps folder on your Cluster Manager and then push out to your indexers.
No changes should be required for your searchheads if you go down that route, but feel free to evaluate the alternatives provided in this post too.
I hope this helps.
Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards
Will
Hi @livehybrid ,
I heard that search time extractions are more better than index time due to performance issues? Is it so? Please clear fy
Hi @splunklearner ,
I guess the answer really is "it depends" however in this scenario we are overwriting the original data with just the JSON, rather than adding an additional extracted field.
Search time field extractions/eval/changes are executed every time you search the data, and in some cases need to be evaluated before the search is filtered down. For example if you search for "uri=/test" then you may find that at search time it needs to process all events to determine the uri field for each event, before it can then filter down. Being able to search against the URI without having to do any modification to every event means it should be faster.
The disadvantage of index-time extractions is that it doesnt apply retrospectively to data you already have, whereas search time will apply to everything currently indexed.
@splunklearner I have standalone server, so you can try this settings on your heavy forwarder or indexers.
I don't have access to UI. I need to do it from backend only. Where I can give this props.conf? In cluster master or deployer? Is it index time extraction or search time?
@splunklearner I tried this using your sample data; please have a look.
[syslogtest]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
category=Custom
pulldown_type=true
SEDCMD-removeheader=s/^[^\{]*//g
KV_MODE=json
AUTO_KV_JSON=true
Hi @kiran_panchavat ,
Thanks for the answer.
But I read that kv_mode = json needs to be given on search time extraction i.e on search heads... But you are saying to give this on indexers or heavy forwarders... Will it help.. please clarify?
Hi @kiran_panchavat ,
This is present in my current props.conf which is there is Cluster Manager for this sourcetype (which is copied from other sourcetype)--
[sony_waf]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
SEDCMD-newline_remove = s/\\r\\n/\n/g
SEDCMD-formatxml =s/></>\n</g
LINE_BREAKER = ([\r\n]+)[A-Z][a-z]
{2}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s
SHOULD_LINEMERGE = False
TRUNCATE = 10000
Now do I need to add here in this props.conf and push it to indexers? Or create new props.conf in Deployer which includes your props.conf stanza and push it to search heads?