Getting Data In

Is there a way to check if Splunk is re-indexing certain files?

att35
Builder

Hi,

We use Splunk Forwarder to monitor application data. There are multiple folders on a given server, each with same set of log files, but since the folder names are a distinguishing factor, we are using crcSalt=<SOURCE> so that Splunk treats all log files differently. 

We also make sure to lock the stanza to a specific extension as needed, e.g. logname.log, or log*.txt, so that rotated files are ignored.

That being said, I still want to find out if there are any situations where splunk could be re-indexing files multiple times and might warrant the use of initCrcLen instead. 

Is this something that's possible via search? Does Splunk forwarder keeps some type of internal record/tracker that it is now re-indexing previously seen file again?

Thanks,

Labels (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @att35 ,

for my knowledge, Splunk doesn't index twice a file, unless you use crcSalt=<SOURCE>.

In this case the file name (and not the content) guides the indexeing, but  two files with the same name (path and filename) cannot be indexed twice.

You can check if you have duplicated logs from the same file with a simple search like the following:

index=*
| stats dc(_raw) AS raw_count BY source
| where raw_count>1

Ciao.

Giuseppe

isoutamo
SplunkTrust
SplunkTrust
If you find reindexed files/events this usually means that someone have removed splunk UF installation and reinstall it. Actually that means removing for _fishbucket directory on UF.
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...