We are trying to ingest large (peta bytes) information into Splunk. The Events are in JSON file structure like - 'audit_events_ip-10-23-186-200_1.1512077259453.json' The pipeline is like - JSON files > Folder > UF > HF Cluster > Indexer Cluster ~ UF - inputs.conf [batch:///folder] _TCP_ROUTING = p2s_au_hf crcSalt = <SOURCE> disabled = false move_policy = sinkhole recursive = false whitelist = \.json$ We are seeing the events from specific files (NOT all) are getting duplicated. It indexes from some file 2 times exactly. As it is [batch:///] which suppose to delete the file after reading it & crcSalt=<SOURCE>, we are NOT able to figure out why & what creates the duplicates. Would appreciate any help, reference or pointers. Thanks in advance!!!
... View more