Solved: How to force Splunk to reindex renamed log files?

gesman · ‎06-26-2015

We want to monitor situations where a log file gets renamed to a different name within the same directory or moved to another directory (under the same or different filename).
Re-indexing the contents of the renamed log file is the preferred approach - we don't care about duplicate events, but the fact that the log file got renamed is an important event by itself that we need to monitor.

Splunk by default does not index renamed logs with the same content - how to override this behavior?

acharlieh · ‎06-26-2015

Be careful! Attributes in splunk config files are case sensitive! Therefore the correct entry to add to each stanza in inputs.conf that you want to reindex upon rename is actually:

crcSalt = <SOURCE>

it is NOT CRCSALT = as @woodcock mentions above.

How this works is that Splunk doesn't use filenames by default to track files, but instead calculates a cyclic redundancy check on the first 256 bytes (default controlled by initCrcLength) of the file as an identifier for the file. The thought is as you roll a log file, most of the time you do not want to reindex the file's contents. crcSalt is a string added to the calculation of the initial CRC to help with reindexing files. The special value \ means to use the file name as the salt value.

But I'm not certain this will actually get you what you want. Assuming well formed log files, the event time is parsed from the entries within the log files themselves, therefore pursuing the above when searching you'll wind up with duplicate log entries, from different sources, but at the same time. Yes _indextime could be used to try to figure out which source came before what, but events in Splunk are searched for and stored in _time order (so it could be really inefficient, especially as your log files get big!) Not to mention given that you're now just salting the file with the filename... If fileA is renamed to fileB, and back to fileA, you won't capture the rename back).

Instead could I propose implementing a file monitoring system such as inotify, and have that write logs as to file renames, and just index this as a separate source of data if you are interested in renames?

View solution in original post

acharlieh · ‎06-26-2015

Be careful! Attributes in splunk config files are case sensitive! Therefore the correct entry to add to each stanza in inputs.conf that you want to reindex upon rename is actually:

crcSalt = <SOURCE>

it is NOT CRCSALT = as @woodcock mentions above.

How this works is that Splunk doesn't use filenames by default to track files, but instead calculates a cyclic redundancy check on the first 256 bytes (default controlled by initCrcLength) of the file as an identifier for the file. The thought is as you roll a log file, most of the time you do not want to reindex the file's contents. crcSalt is a string added to the calculation of the initial CRC to help with reindexing files. The special value \ means to use the file name as the salt value.

But I'm not certain this will actually get you what you want. Assuming well formed log files, the event time is parsed from the entries within the log files themselves, therefore pursuing the above when searching you'll wind up with duplicate log entries, from different sources, but at the same time. Yes _indextime could be used to try to figure out which source came before what, but events in Splunk are searched for and stored in _time order (so it could be really inefficient, especially as your log files get big!) Not to mention given that you're now just salting the file with the filename... If fileA is renamed to fileB, and back to fileA, you won't capture the rename back).

Instead could I propose implementing a file monitoring system such as inotify, and have that write logs as to file renames, and just index this as a separate source of data if you are interested in renames?

gesman · ‎06-26-2015

Yes, it does exactly what I need.

And yes, I already using _indextime to do necessary tasks. Hint: these are not log files with events that I am indexing, hence _indextime is the only time reference that I have and use.
Care needs to be exercised to configure crcSalt before enabling (or populating) this data source - otherwise Splunk would unnecessarily re-index everything.
inotify probably would be a cleaner solution to detect pure act of renaming but I also need to have access to the latest file contents. The drawback of it - it does not monitor recursively inside subfolders. So in my case I made Splunk to do better inotify job.
Renaming back to the same filename is not a problem (for my case) because I'll still have access to the latest content of fileA (even after second or third rename).

lguinn2 · ‎06-26-2015

@woodcock fixed his answer (dang the markdown somtimes)!

I like the suggestion of using inotify or a similar system, as it gets directly at what you are trying to monitor: the action of renaming a file.

gesman · ‎06-26-2015

inotify does not do recursive monitoring and also - i want to avoid adding too many moving parts outside of Splunk.

acharlieh · ‎06-26-2015

The markdown is indeed being persnickety. The special value I'm trying to reference is the literal value that I mention to set, and is called out in the linked documentation as a special case.

How to force Splunk to reindex renamed log files?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation