Getting Data In

How to avoid indexing duplicates?

subasm
Loves-to-Learn

We are trying to ingest large (peta bytes) information into Splunk. 

The Events are in JSON file structure like - 'audit_events_ip-10-23-186-200_1.1512077259453.json'

The pipeline is like - 

JSON files > Folder > UF > HF Cluster > Indexer Cluster

 

~ UF - inputs.conf

[batch:///folder]

_TCP_ROUTING = p2s_au_hf

crcSalt = <SOURCE>

disabled = false

move_policy = sinkhole

recursive = false

whitelist = \.json$

 

We are seeing the events from specific files (NOT all) are getting duplicated. It indexes from some file 2 times exactly. 

As it is [batch:///] which suppose to delete the file after reading it & crcSalt=<SOURCE>, we are NOT able to figure out why & what creates the duplicates. 

Would appreciate any help, reference or pointers. Thanks in advance!!!

Labels (1)
0 Karma

subasm
Loves-to-Learn

Apparently the source files transfer to folder is in our control - it is verified that the data is NOT duplicates. 

It seems to me there are issues while the data is inflight UF -> HF -> Indexers.

Not sure how the ACK works in this set up.  

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

I'm quite sure that the issue is in the data.

Open a case to Splunk Support to be sure.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

probably your logs are rotated in a different file at midnight, so the crcSal option duplicates your indexed data, did you tried without this option?

Ciao.

Giuseppe

0 Karma

subasm
Loves-to-Learn

We are manually copying the files to the <DIR> and from there onwards UF is supposed to pick up.

So I don't think there is rolling over of the same files at midnight. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

if there isn't a rotation, the data are duplicatd at the origin, anyway, if you don't use crcSalt option you have sure to avoid duplicates because Splunk uses its archive (_fishbuckets) to store the already ingested data.

Ciao.

Giuseppe

Get Updates on the Splunk Community!

Finding Based Detections General Availability

Overview  We’ve come a long way, folks, but here in Enterprise Security 8.4 I’m happy to announce Finding ...

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience

Hands-On Learning and Technical Seminars  Sometimes, you just need to see the code. For those looking for a ...

What’s New in Splunk Observability Cloud: January Feature Highlights & Deep Dives

Splunk Observability Cloud continues to evolve, empowering engineering and operations teams with advanced ...