Splunk Search

Splunk duplicating events every time file changes

bitbuck3t
New Member

I have created a directory to store log files that I pull from a remote machine. I use a cronjob to pull every x minutes and that calls a script which rsyncs the files over. Splunk is configured to monitor this directory. Using auth.log as an example, Splunk will index that file as expected the first time it appears in the directory. After that, anytime the file changes, Splunk re-indexes the entire file. So if there were 500 events initially, and the cronjob runs and now there are 510 events in the file (10 additional from the last time), Splunk will show 1010 events.

I also discovered that I can trigger Splunk to reindex the entire file simply by using the touch command on the file, leaving everything else about the file intact.

What I don't understand is that the *nix app automatically monitors /var/log on the Splunk machine and that behaves as I would expect. Only new events are added as the file changes and using touch on any of the files does not cause that file to be completely reindexed.

I have tried using rsync in append mode. I have also tried using the atomic-rsync perl script which basically rsyncs files to a temporary directory and then after everything has transferred, does a rename operation over the old files. Nothing I have tried so far seems to work.

I am new to Splunk, so I have to assume I am doing something wrong, but I really need to figure out what that might be because having my events constantly being replicated in full is not good.

My inputs.conf for the directory in question is:

[monitor:///usr/local/splunk/etc/apps/unix/local/remote_logs/machine1]
disabled = false
followTail = 0
host = 
host_segment = 9
index = os

I'm using Splunk version 4.1.5, build 85165 on FreeBSD 8.1-release

Tags (1)
0 Karma
1 Solution

amrit
Splunk Employee
Splunk Employee

Based on this:

I can trigger Splunk to reindex the entire file simply by using the touch command on the file

it sounds like you've enabled CHECK_METHOD=modtime in props.conf. This is a setting that abandons normal file tracking (for the specified files) and instead reindexes them in their entirety when the modtime changes.

If you don't think you've done so, can you list which sourcetypes splunk is indexing these files as?

View solution in original post

amrit
Splunk Employee
Splunk Employee

Based on this:

I can trigger Splunk to reindex the entire file simply by using the touch command on the file

it sounds like you've enabled CHECK_METHOD=modtime in props.conf. This is a setting that abandons normal file tracking (for the specified files) and instead reindexes them in their entirety when the modtime changes.

If you don't think you've done so, can you list which sourcetypes splunk is indexing these files as?

bitbuck3t
New Member

Thank you! I didn't explicitly have that set, however I do remember that Splunk was detecting this as a config_file, which uses modtime. I had overridden the sourcetype in the local props.conf to syslog, but that apparently doesn't inherit the CHECK_METHOD of syslog files. I explicitly set to endpoint_md5 and it works now, only indexing new events.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...