I have a scheduled search that outputs the results every 5 minutes using the outputcsv command to local disk. The file is stored with name abc_dns.csv
index=abc |fields _time _raw |fields - _indextime _sourcetype _subsecond |outputcsv abc_dns
Then I am forwarding that file to an external Indexer
inputs.conf
[monitor:///opt/splunk/var/run/splunk/csv/abc_dns.csv]
index = abc_dns_logs
sourcetype = abc_dns
#crcSalt = <SOURCE>
Below is the props.conf
[abc_dns]
INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = structured
TRANSFORMS-t1 = eliminate_header
transforms.conf
[eliminate_header]
REGEX = ^"_time","_raw"$
DEST_KEY = queue
FORMAT = nullQueue
When I validate the results, I am seeing data is getting duplicated on the external Indexer.
I attempted to add crcSalt = <SOURCE> to check if it makes any difference, which seemed that it did initially, however, afterwhile, I saw data was getting duplicated again. In reality, there is indeed duplicate data in original logs itself, but overall I am actually seeing data from the monitored file is also getting duplicated.
Can anyone please help with this ?
If the csv file generated from the search has duplicate rows, indexer will index them as is. You need to remove duplicates in your search.
Please try this out.
index=abc |fields _time _raw |fields - _indextime _sourcetype _subsecond | dedup field_names* |outputcsv abc_dns
-- Hope this helps