Getting Data In

Duplicate values because of json values

splunklearner
Communicator

We have json fields to be auto extracted onto Splunk. We have some non json data to be removed and then auto extract the data.

So I given following props.conf on my indexers -

[sony_waf] 

TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
LINE_BREAKER=([\r\n]+)
SEDCMD-removeheader=s/^[^\{]*//g
SHOULD_LINEMERGE = False
INDEXED_EXTRACTIONS = JSON
TRUNCATE = 20000
 

and this props.conf on my SH:

[sony_waf]
KV_MODE = none
AUTO_KV_JSON = false

 

Props.conf on my UF: (which is there from before)

[sony_waf]
NO_BINARY_CHECK = true
EVENT_BREAKER_ENABLE = true

When I done this, duplicate events are populating.

When I remove my INDEXED_EXTRACTIONS from indexers and keep it in UF props.conf... logs are not being ingested.

Tried to give KV_MODE = json by removing KV_MODE and AUTO_KV_JSON in SH still the same duplication.

completely confused here. Now even though I remove everything what I have given still duplicate logs coming. Checked in log path from source no duplicate logs are showing. even I have given crcsalt still the same issue.

Please guide me to give the correct config in correct place...

Labels (2)
0 Karma

marycordova
SplunkTrust
SplunkTrust

Also... INDEXED_EXTRACTIONS uses up disk space... so I almost never use it 😊

@marycordova

splunklearner
Communicator

@mattymo please remove it from the config and lets focus on getting your data massaged and auto parsing at search time.  ---> How it will auto parse the data at search time?

0 Karma

mattymo
Splunk Employee
Splunk Employee

As I explained before, kv_mode on the search head is all thats needed to auto parse well formatted json. 

see the spec file for KV_MODE here and then for INDEXED_EXTRACTIONS  here noting it explains why you should NOT set both. 

They are two means to a similar outcome, but indexed_extractions actually puts the value into TSIDX files, where search time it does not. You should always start with search time and only move fields that absolutely need it to index time.

Please read this and consider taking a few of the Free Splunk EDU classes to learn more 

- MattyMo
0 Karma

mattymo
Splunk Employee
Splunk Employee

I believe you are mixing scenarios here, leading to your confusion. Allow me to try and unwind this a bit. 

Duplicate events are likely unrelated to your json extractions. Let's separate the two items:

1. Indexed Extractions - Lets start with your config. As I mentioned in the previous answers post, you DO NOT need INDEXED_EXTRACTIONS=JSON for this use case. At least not to start. Furthermore, if you only put that setting on the Indexers, as shown above, it does nothing. This setting is meant for properly formatted JSON events and must be set on the forwarder and send to indexers already parsed - Please read this doc explaining the feature

Please take INDEXED_EXTRACTIONS out of the equation moving forward ok? It is causing unnecessary confusion here because your original data IS NOT JSON. You do not need this setting to auto parse JSON at search time, which should always be the first step when onboarding data. I almost ALWAYS try and avoid INDEXED_EXTRACTIONS for reasons that are beyond the scope of getting you sorted.  please remove it from the config and lets focus on getting your data massaged and auto parsing at search time. 

2. Dupe Events - Duplicate events can happen for a few reasons, but none of them are generally related to json parsing. Duplicate events can be confirmed by comparing the _raw event to confirm they are complete dupes.

See this helpful answer to see how you can validate whether they are truly duplicates, then we can go from there on why you have duplicate events. This should/will be completely unrelated to your json extractions, and is more likely do to your inputs configuration, where your collector is reading the same file twice, or truly is duplicated in your source files. 

 

I don't want you to continue twisting in the wind on this data onboarding, it's been ongoing for quite sometime. Do you know who your Splunk account team is? Your Sales Engineer should be able to help you get unstuck. Please contact them as we have various folks who can sit with you and show you the deal. If you don't know who they are, DM me and I can find them for you. No need to continue to keep banging your head on the desk when we have plenty of trained experts that can help you navigate this learning path. 


- MattyMo

splunklearner
Communicator
2/10/25
11:00:18.000 AM
{ [-]
   adftrue
   avg_ingress_latency_fe0
   client_dest_port443
   client_ip128.12.73.92
   client_rtt2
   client_src_port23575
   conn_est_time_fe1
   log_id97378
   max_ingress_latency_fe0
   ocsp_status_resp_senttrue
   report_timestamp2025-02-10T11:00:18.780490Z
   request_stateAVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING
   service_engineGB-DRN-AB-Tier2-se-vxeuz
   significant0
   significant_log: [ [+]
   ]

   source_ip128.12.73.92
   tenant_nameadmin
   udffalse
   vcpu_id0
   virtualservicevirtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7
   vs_ip128.160.71.101
   vs_namev-wasphictst-wdc.hc.cloud.uk.fed-443

 

2/10/25
11:00:18.000 AM
{ [-]
   adftrue
   avg_ingress_latency_fe0
   client_dest_port443
   client_ip128.12.53.70
   client_rtt1
   client_src_port50068
   conn_est_time_fe1
   log_id97377
   max_ingress_latency_fe0
   ocsp_status_resp_senttrue
   report_timestamp2025-02-10T11:00:18.779796Z
   request_stateAVI_HTTP_REQUEST_STATE_SSL_HANDSHAKING
   service_engineGB-DRN-AB-Tier2-se-vxeuz
   significant0
   significant_log: [ [+]
   ]

   source_ip128.12.53.70
   tenant_nameadmin
   udffalse
   vcpu_id0
   virtualservicevirtualservice-e52d1117-b508-4a6d-9fb5-f03ca6319af7
   vs_ip128.160.71.101
   vs_namev-wasphictst-wdc.hc.cloud.uk.fed-443

}

 

Are these two duplicate events? We are receiving in the same way in our UFs as well.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...