Duplicate values because of json values

splunklearner · ‎02-10-2025

We have json fields to be auto extracted onto Splunk. We have some non json data to be removed and then auto extract the data.

So I given following props.conf on my indexers -

[sony_waf]

TIME_PREFIX = ^

MAX_TIMESTAMP_LOOKAHEAD = 25

TIME_FORMAT = %b %d %H:%M:%S

LINE_BREAKER=([\r\n]+)

SEDCMD-removeheader=s/^[^\{]*//g

SHOULD_LINEMERGE = False
INDEXED_EXTRACTIONS = JSON

TRUNCATE = 20000

and this props.conf on my SH:

[sony_waf]

KV_MODE = none

AUTO_KV_JSON = false

Props.conf on my UF: (which is there from before)

[sony_waf]
NO_BINARY_CHECK = true
EVENT_BREAKER_ENABLE = true

When I done this, duplicate events are populating.

When I remove my INDEXED_EXTRACTIONS from indexers and keep it in UF props.conf... logs are not being ingested.

Tried to give KV_MODE = json by removing KV_MODE and AUTO_KV_JSON in SH still the same duplication.

completely confused here. Now even though I remove everything what I have given still duplicate logs coming. Checked in log path from source no duplicate logs are showing. even I have given crcsalt still the same issue.

Please guide me to give the correct config in correct place...

marycordova · ‎02-10-2025

Also... INDEXED_EXTRACTIONS uses up disk space... so I almost never use it 😊

@marycordova

splunklearner · ‎02-10-2025

@mattymo please remove it from the config and lets focus on getting your data massaged and auto parsing at search time. ---> How it will auto parse the data at search time?

mattymo · ‎02-10-2025

As I explained before, kv_mode on the search head is all thats needed to auto parse well formatted json.

see the spec file for KV_MODE here and then for INDEXED_EXTRACTIONS here noting it explains why you should NOT set both.

They are two means to a similar outcome, but indexed_extractions actually puts the value into TSIDX files, where search time it does not. You should always start with search time and only move fields that absolutely need it to index time.

Please read this and consider taking a few of the Free Splunk EDU classes to learn more

- MattyMo

mattymo · ‎02-10-2025

I believe you are mixing scenarios here, leading to your confusion. Allow me to try and unwind this a bit.

Duplicate events are likely unrelated to your json extractions. Let's separate the two items:

1. Indexed Extractions - Lets start with your config. As I mentioned in the previous answers post, you DO NOT need INDEXED_EXTRACTIONS=JSON for this use case. At least not to start. Furthermore, if you only put that setting on the Indexers, as shown above, it does nothing. This setting is meant for properly formatted JSON events and must be set on the forwarder and send to indexers already parsed - Please read this doc explaining the feature

Please take INDEXED_EXTRACTIONS out of the equation moving forward ok? It is causing unnecessary confusion here because your original data IS NOT JSON. You do not need this setting to auto parse JSON at search time, which should always be the first step when onboarding data. I almost ALWAYS try and avoid INDEXED_EXTRACTIONS for reasons that are beyond the scope of getting you sorted. please remove it from the config and lets focus on getting your data massaged and auto parsing at search time.

2. Dupe Events - Duplicate events can happen for a few reasons, but none of them are generally related to json parsing. Duplicate events can be confirmed by comparing the _raw event to confirm they are complete dupes.

See this helpful answer to see how you can validate whether they are truly duplicates, then we can go from there on why you have duplicate events. This should/will be completely unrelated to your json extractions, and is more likely do to your inputs configuration, where your collector is reading the same file twice, or truly is duplicated in your source files.

I don't want you to continue twisting in the wind on this data onboarding, it's been ongoing for quite sometime. Do you know who your Splunk account team is? Your Sales Engineer should be able to help you get unstuck. Please contact them as we have various folks who can sit with you and show you the deal. If you don't know who they are, DM me and I can find them for you. No need to continue to keep banging your head on the desk when we have plenty of trained experts that can help you navigate this learning path.

- MattyMo

splunklearner · ‎02-10-2025

2/10/25
11:00:18.000 AM

{ [-]
   adf: true
   avg_ingress_latency_fe: 0
   client_dest_port: 443
   client_ip: 128.12.73.92
   client_rtt: 2
   client_src_port: 23575
   conn_est_time_fe: 1
   log_id: 97378
   max_ingress_latency_fe: 0
   ocsp_status_resp_sent: true
   report_timestamp: 2025-02-10T11:00:18.780490Z
   request_state: AVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING
   service_engine: GB-DRN-AB-Tier2-se-vxeuz
   significant: 0
   significant_log: [ [+]
   ]
   source_ip: 128.12.73.92
   tenant_name: admin
   udf: false
   vcpu_id: 0
   virtualservice: virtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7
   vs_ip: 128.160.71.101
   vs_name: v-wasphictst-wdc.hc.cloud.uk.fed-443

2/10/25
11:00:18.000 AM

{ [-]
   adf: true
   avg_ingress_latency_fe: 0
   client_dest_port: 443
   client_ip: 128.12.53.70
   client_rtt: 1
   client_src_port: 50068
   conn_est_time_fe: 1
   log_id: 97377
   max_ingress_latency_fe: 0
   ocsp_status_resp_sent: true
   report_timestamp: 2025-02-10T11:00:18.779796Z
   request_state: AVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING
   service_engine: GB-DRN-AB-Tier2-se-vxeuz
   significant: 0
   significant_log: [ [+]
   ]
   source_ip: 128.12.53.70
   tenant_name: admin
   udf: false
   vcpu_id: 0
   virtualservice: virtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7
   vs_ip: 128.160.71.101
   vs_name: v-wasphictst-wdc.hc.cloud.uk.fed-443
}

Are these two duplicate events? We are receiving in the same way in our UFs as well.

Duplicate values because of json values

inputs.conf

JSON

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?

Duplicate values because of json values

inputs.conf

JSON

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem