Getting Data In

Splunk indexes

elaine0102
Explorer

Hi, my current situation is I have a log of 400 events & will increase if there is new data.

Let's say now my log has 400 events & I have a .bat which run every 60s but everytime it Splunk run the .bat, it indexes another 400 events & another 400 events and so on. This resulted me having many times duplicated events. (.bat run 4x, I will have 4x400 duplicated events)

Any ways to advice me? So that when my .bat run everytime & Splunk will only index my latest data (400).

Tags (2)
0 Karma

sdaniels
Splunk Employee
Splunk Employee

If Splunk is monitoring a file and the header (by default 256 bytes) does not change and file name doesn't change, Splunk should remember where it is in the file and when new data is appended, Splunk will just pull in the new data. I would try the crcSalt=<SOURCE> option to see if that makes sure you don't get the duplicates. Something doesn't seem right if you are getting duplicates.

I know you aren't rolling this log but there are some relevant pieces here to understand how Splunk monitors the file. Also relevant to your other question.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Howlogfilerotationishandled

The monitoring processor picks up new files and reads the first and last 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCs against a database that contains all the CRCs of files Splunk has seen before. The location Splunk last read in the file, known as the file's seekPtr, is also stored.

There are three possible outcomes of a CRC check:

1. There is no begin and end CRC matching this file in the database. This indicates a new file. Splunk will pick it up and consume its data from the start of the file. Splunk updates the database with the new CRCs and seekPtrs as the file is being consumed.

2. The begin CRC and the end CRC are both present, but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been data added to it since it was last read. Splunk opens the file, seeks to the previous end of the file, and starts reading from there. In this way, Splunk will only grab the new data and not anything it has read before.

3. The begin CRC is present, but the end CRC does not match. This means that Splunk has previously read the file but that some of the material that it read has since changed. In this case, Splunk must re-read the whole file.

Important: Since the CRC start check is run against only the first 256 bytes of the file, it is possible for non-duplicate files to have duplicate start CRCs, particularly if the files are ones with identical headers. To handle such situations you can use the crcSalt attribute when configuring the file in inputs.conf, as described here. The crcSalt attribute ensures that each file has a unique CRC. You do not want to use this attribute with rolling log files, however, because it defeats Splunk's ability to recognize rolling logs and will cause Splunk to re-index the data. 

justgovind30198
Explorer

Hi sdaniels,

The third point you mentioned is not indexing the new file in my case.
So my begin CRC is matching but end CRC is not matching (I have changed some content in last 256 bytes), but still it is not indexing my file again.

Please help.

0 Karma

sdaniels
Splunk Employee
Splunk Employee

Yes, just place it in inputs.conf and restart the server.

elaine0102
Explorer

Hi, thank you for your reply. So if I am trying out "crcSalt", I can just manually placed it at the inputs.CONF will do?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...