Getting Data In

How to avoid indexing duplicates?

subasm
Loves-to-Learn

We are trying to ingest large (peta bytes) information into Splunk. 

The Events are in JSON file structure like - 'audit_events_ip-10-23-186-200_1.1512077259453.json'

The pipeline is like - 

JSON files > Folder > UF > HF Cluster > Indexer Cluster

 

~ UF - inputs.conf

[batch:///folder]

_TCP_ROUTING = p2s_au_hf

crcSalt = <SOURCE>

disabled = false

move_policy = sinkhole

recursive = false

whitelist = \.json$

 

We are seeing the events from specific files (NOT all) are getting duplicated. It indexes from some file 2 times exactly. 

As it is [batch:///] which suppose to delete the file after reading it & crcSalt=<SOURCE>, we are NOT able to figure out why & what creates the duplicates. 

Would appreciate any help, reference or pointers. Thanks in advance!!!

Labels (1)
0 Karma

subasm
Loves-to-Learn

Apparently the source files transfer to folder is in our control - it is verified that the data is NOT duplicates. 

It seems to me there are issues while the data is inflight UF -> HF -> Indexers.

Not sure how the ACK works in this set up.  

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

I'm quite sure that the issue is in the data.

Open a case to Splunk Support to be sure.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

probably your logs are rotated in a different file at midnight, so the crcSal option duplicates your indexed data, did you tried without this option?

Ciao.

Giuseppe

0 Karma

subasm
Loves-to-Learn

We are manually copying the files to the <DIR> and from there onwards UF is supposed to pick up.

So I don't think there is rolling over of the same files at midnight. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

if there isn't a rotation, the data are duplicatd at the origin, anyway, if you don't use crcSalt option you have sure to avoid duplicates because Splunk uses its archive (_fishbuckets) to store the already ingested data.

Ciao.

Giuseppe

Get Updates on the Splunk Community!

App Platform's 2025 Year in Review: A Year of Innovation, Growth, and Community

As we step into 2026, it’s the perfect moment to reflect on what an extraordinary year 2025 was for the Splunk ...

Operationalizing Entity Risk Score with Enterprise Security 8.3+

Overview Enterprise Security 8.3 introduces a powerful new feature called “Entity Risk Scoring” (ERS) for ...

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...