Getting Data In

How to configure Splunk to index identical files with different timestamps?

leon24
Explorer

Hi all,

I have a batch job that monitors my infrastructure health (basically doing "resource cluster" to check for resource statuses). The batch job is performed at 10 minute intervals.

The output is the same when there are no issues with my infrastructure. However, Splunk does not index the file because the contents are the same.

I have tried to add the following lines into my props.conf (on the server where my forwarder is installed) under \$SPLUNKHOME\etc\system\local

[source::D:\Program Files\Splunk.....\scripts\text.txt]
CHECK_METHOD = entire_md5

My \$SPLUNK_HOME\etc\apps\cluster\local\inputs.conf has a [monitor://D:\Program Files\Splunk.....\scripts\text.txt] inside with the correct settings such as sourcetype.

I can monitor the file but I'm unable to let Splunk index the file that has identical contents but different timestamp.

Am I missing something anywhere?

1 Solution

lguinn2
Legend

Add the following to the monitor stanza in \$SPLUNK_HOME\etc\apps\cluster\local\inputs.conf

crcSalt = <SOURCE>

Normally, Splunk compares the first few lines of the file to determine if it has already indexed the file. As you noticed, Splunk does not want to index files if their contents are the same. The crcSalt attribute above adds the full path of the source file to this equation - so if two files have the same initial contents but different names - Splunk will still index the file.

Now, your script needs to generate a unique file name for each run. This can be easily done by adding the timestamp to the file name in the script. If you want the data to be indexed with the same source name, you can set source=text.txt in inputs.conf to override the default source.

AFAIK, there is no way for Splunk to access the timestamp as a way of discriminating between files on input.

View solution in original post

lguinn2
Legend

Add the following to the monitor stanza in \$SPLUNK_HOME\etc\apps\cluster\local\inputs.conf

crcSalt = <SOURCE>

Normally, Splunk compares the first few lines of the file to determine if it has already indexed the file. As you noticed, Splunk does not want to index files if their contents are the same. The crcSalt attribute above adds the full path of the source file to this equation - so if two files have the same initial contents but different names - Splunk will still index the file.

Now, your script needs to generate a unique file name for each run. This can be easily done by adding the timestamp to the file name in the script. If you want the data to be indexed with the same source name, you can set source=text.txt in inputs.conf to override the default source.

AFAIK, there is no way for Splunk to access the timestamp as a way of discriminating between files on input.

leon24
Explorer

Hi Iguinn,

Yeah thanks your method works.

What I did was the following, according to what you have mentioned:

  1. Added timestamp to the filename created by the batch file
  2. Added crcSalt = into inputs,conf

However, from the props.conf documentation:

CHECK_METHOD = [endpoint_md5|entire_md5|modtime]
* Set CHECK_METHOD endpoint_md5 to have Splunk checksum of the first and last 256 bytes of a
file. When it finds matches, Splunk lists the file as already indexed and indexes only new
data, or ignores it if there is no new data.
* Set CHECK_METHOD = entire_md5 to use the checksum of the entire file.
* Set CHECK_METHOD = modtime to check only the modification time of the file.

Shouldn't adding CHECK_METHOD = modtime suffice for my case instead of needing to add crcSalt and change the filename with timestamp?

Regards,
Leon

0 Karma

lguinn2
Legend

I haven't ever tried that option, but yes, I think that CHECK_METHOD = modtime should work in your case.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...