Getting Data In

Splunk is re-indexing entire file, not just changes

thesteve
Path Finder

I have a vendor provided log file (I have no way to change it) that has both a changing header and a changing footer.

In between are log lines, 1 entry per line.

The problem I am facing is that splunk is re-indexing the entire file when it gets updated, not just the newly added lines.

I am parsing out the header and footer using two entries in transforms.conf.

[setNull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[dateAndData]
DEST_KEY = queue
REGEX    = ^(Fri|Sat|Sun|Mon|Tue|Wed|Thu).{22}
FORMAT   = indexQueue

Is there anything I can do to get splunk to recognize the data that it has already indexed? The only thing I can think of at this point is writing my own script that would extract data from the file and I'd rather not do that unless it was absolutely necessary.

0 Karma
1 Solution

lguinn2
Legend

Splunk is designed to read a file from beginning to end. Changing info in the beginning or middle of a file can cause confusion.

When Splunk opens a file, it looks at the first 256 bytes and tries to determine if it has seen the file before. If the header changes (as it seems yours does), then Splunk says "aha - a new file" and indexes the data again.

You might be able to set the following in your inputs.conf to stop this

[monitor:///yourmonitorstanza]
initCrcLength = 50
crcSalt = <SOURCE>

This assumes that the first 50 bytes will NOT change, so you might need to adjust this to an even smaller number. The crcSalt helps to ensure that Splunk will not confuse this file with another file that might have the same first 50 bytes but a different name.

View solution in original post

lguinn2
Legend

Splunk is designed to read a file from beginning to end. Changing info in the beginning or middle of a file can cause confusion.

When Splunk opens a file, it looks at the first 256 bytes and tries to determine if it has seen the file before. If the header changes (as it seems yours does), then Splunk says "aha - a new file" and indexes the data again.

You might be able to set the following in your inputs.conf to stop this

[monitor:///yourmonitorstanza]
initCrcLength = 50
crcSalt = <SOURCE>

This assumes that the first 50 bytes will NOT change, so you might need to adjust this to an even smaller number. The crcSalt helps to ensure that Splunk will not confuse this file with another file that might have the same first 50 bytes but a different name.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...