We have json fields to be auto extracted onto Splunk. We have some non json data to be removed and then auto extract the data.
So I given following props.conf on my indexers -
[sony_waf]
and this props.conf on my SH:
Props.conf on my UF: (which is there from before)
[sony_waf]
NO_BINARY_CHECK = true
EVENT_BREAKER_ENABLE = true
When I done this, duplicate events are populating.
When I remove my INDEXED_EXTRACTIONS from indexers and keep it in UF props.conf... logs are not being ingested.
Tried to give KV_MODE = json by removing KV_MODE and AUTO_KV_JSON in SH still the same duplication.
completely confused here. Now even though I remove everything what I have given still duplicate logs coming. Checked in log path from source no duplicate logs are showing. even I have given crcsalt still the same issue.
Please guide me to give the correct config in correct place...
Also... INDEXED_EXTRACTIONS uses up disk space... so I almost never use it 😊
@mattymo please remove it from the config and lets focus on getting your data massaged and auto parsing at search time. ---> How it will auto parse the data at search time?
As I explained before, kv_mode on the search head is all thats needed to auto parse well formatted json.
see the spec file for KV_MODE here and then for INDEXED_EXTRACTIONS here noting it explains why you should NOT set both.
They are two means to a similar outcome, but indexed_extractions actually puts the value into TSIDX files, where search time it does not. You should always start with search time and only move fields that absolutely need it to index time.
Please read this and consider taking a few of the Free Splunk EDU classes to learn more
I believe you are mixing scenarios here, leading to your confusion. Allow me to try and unwind this a bit.
Duplicate events are likely unrelated to your json extractions. Let's separate the two items:
1. Indexed Extractions - Lets start with your config. As I mentioned in the previous answers post, you DO NOT need INDEXED_EXTRACTIONS=JSON for this use case. At least not to start. Furthermore, if you only put that setting on the Indexers, as shown above, it does nothing. This setting is meant for properly formatted JSON events and must be set on the forwarder and send to indexers already parsed - Please read this doc explaining the feature
Please take INDEXED_EXTRACTIONS out of the equation moving forward ok? It is causing unnecessary confusion here because your original data IS NOT JSON. You do not need this setting to auto parse JSON at search time, which should always be the first step when onboarding data. I almost ALWAYS try and avoid INDEXED_EXTRACTIONS for reasons that are beyond the scope of getting you sorted. please remove it from the config and lets focus on getting your data massaged and auto parsing at search time.
2. Dupe Events - Duplicate events can happen for a few reasons, but none of them are generally related to json parsing. Duplicate events can be confirmed by comparing the _raw event to confirm they are complete dupes.
See this helpful answer to see how you can validate whether they are truly duplicates, then we can go from there on why you have duplicate events. This should/will be completely unrelated to your json extractions, and is more likely do to your inputs configuration, where your collector is reading the same file twice, or truly is duplicated in your source files.
I don't want you to continue twisting in the wind on this data onboarding, it's been ongoing for quite sometime. Do you know who your Splunk account team is? Your Sales Engineer should be able to help you get unstuck. Please contact them as we have various folks who can sit with you and show you the deal. If you don't know who they are, DM me and I can find them for you. No need to continue to keep banging your head on the desk when we have plenty of trained experts that can help you navigate this learning path.
2/10/25 11:00:18.000 AM | { [-] adf: true avg_ingress_latency_fe: 0 client_dest_port: 443 client_ip: 128.12.73.92 client_rtt: 2 client_src_port: 23575 conn_est_time_fe: 1 log_id: 97378 max_ingress_latency_fe: 0 ocsp_status_resp_sent: true report_timestamp: 2025-02-10T11:00:18.780490Z request_state: AVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING service_engine: GB-DRN-AB-Tier2-se-vxeuz significant: 0 significant_log: [ [+] ] source_ip: 128.12.73.92 tenant_name: admin udf: false vcpu_id: 0 virtualservice: virtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7 vs_ip: 128.160.71.101 vs_name: v-wasphictst-wdc.hc.cloud.uk.fed-443 |
2/10/25 11:00:18.000 AM | { [-] adf: true avg_ingress_latency_fe: 0 client_dest_port: 443 client_ip: 128.12.53.70 client_rtt: 1 client_src_port: 50068 conn_est_time_fe: 1 log_id: 97377 max_ingress_latency_fe: 0 ocsp_status_resp_sent: true report_timestamp: 2025-02-10T11:00:18.779796Z request_state: AVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING service_engine: GB-DRN-AB-Tier2-se-vxeuz significant: 0 significant_log: [ [+] ] source_ip: 128.12.53.70 tenant_name: admin udf: false vcpu_id: 0 virtualservice: virtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7 vs_ip: 128.160.71.101 vs_name: v-wasphictst-wdc.hc.cloud.uk.fed-443 } |
Are these two duplicate events? We are receiving in the same way in our UFs as well.