Getting Data In

How to avoid indexing duplicates?

subasm
Loves-to-Learn

We are trying to ingest large (peta bytes) information into Splunk. 

The Events are in JSON file structure like - 'audit_events_ip-10-23-186-200_1.1512077259453.json'

The pipeline is like - 

JSON files > Folder > UF > HF Cluster > Indexer Cluster

 

~ UF - inputs.conf

[batch:///folder]

_TCP_ROUTING = p2s_au_hf

crcSalt = <SOURCE>

disabled = false

move_policy = sinkhole

recursive = false

whitelist = \.json$

 

We are seeing the events from specific files (NOT all) are getting duplicated. It indexes from some file 2 times exactly. 

As it is [batch:///] which suppose to delete the file after reading it & crcSalt=<SOURCE>, we are NOT able to figure out why & what creates the duplicates. 

Would appreciate any help, reference or pointers. Thanks in advance!!!

Labels (1)
0 Karma

subasm
Loves-to-Learn

Apparently the source files transfer to folder is in our control - it is verified that the data is NOT duplicates. 

It seems to me there are issues while the data is inflight UF -> HF -> Indexers.

Not sure how the ACK works in this set up.  

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

I'm quite sure that the issue is in the data.

Open a case to Splunk Support to be sure.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

probably your logs are rotated in a different file at midnight, so the crcSal option duplicates your indexed data, did you tried without this option?

Ciao.

Giuseppe

0 Karma

subasm
Loves-to-Learn

We are manually copying the files to the <DIR> and from there onwards UF is supposed to pick up.

So I don't think there is rolling over of the same files at midnight. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @subasm,

if there isn't a rotation, the data are duplicatd at the origin, anyway, if you don't use crcSalt option you have sure to avoid duplicates because Splunk uses its archive (_fishbuckets) to store the already ingested data.

Ciao.

Giuseppe

Get Updates on the Splunk Community!

What the End of Support for Splunk Add-on Builder Means for You

Hello Splunk Community! We want to share an important update regarding the future of the Splunk Add-on Builder ...

Solve, Learn, Repeat: New Puzzle Channel Now Live

Welcome to the Splunk Puzzle PlaygroundIf you are anything like me, you love to solve problems, and what ...

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...