We want to monitor situations where a log file gets renamed to a different name within the same directory or moved to another directory (under the same or different filename).
Re-indexing the contents of the renamed log file is the preferred approach - we don't care about duplicate events, but the fact that the log file got renamed is an important event by itself that we need to monitor.
Splunk by default does not index renamed logs with the same content - how to override this behavior?
Be careful! Attributes in splunk config files are case sensitive! Therefore the correct entry to add to each stanza in inputs.conf that you want to reindex upon rename is actually:
crcSalt = <SOURCE>
it is NOT CRCSALT =
as @woodcock mentions above.
How this works is that Splunk doesn't use filenames by default to track files, but instead calculates a cyclic redundancy check on the first 256 bytes (default controlled by initCrcLength
) of the file as an identifier for the file. The thought is as you roll a log file, most of the time you do not want to reindex the file's contents. crcSalt is a string added to the calculation of the initial CRC to help with reindexing files. The special value \
means to use the file name as the salt value.
But I'm not certain this will actually get you what you want. Assuming well formed log files, the event time is parsed from the entries within the log files themselves, therefore pursuing the above when searching you'll wind up with duplicate log entries, from different sources, but at the same time. Yes _indextime
could be used to try to figure out which source came before what, but events in Splunk are searched for and stored in _time
order (so it could be really inefficient, especially as your log files get big!) Not to mention given that you're now just salting the file with the filename... If fileA is renamed to fileB, and back to fileA, you won't capture the rename back).
Instead could I propose implementing a file monitoring system such as inotify, and have that write logs as to file renames, and just index this as a separate source of data if you are interested in renames?
Be careful! Attributes in splunk config files are case sensitive! Therefore the correct entry to add to each stanza in inputs.conf that you want to reindex upon rename is actually:
crcSalt = <SOURCE>
it is NOT CRCSALT =
as @woodcock mentions above.
How this works is that Splunk doesn't use filenames by default to track files, but instead calculates a cyclic redundancy check on the first 256 bytes (default controlled by initCrcLength
) of the file as an identifier for the file. The thought is as you roll a log file, most of the time you do not want to reindex the file's contents. crcSalt is a string added to the calculation of the initial CRC to help with reindexing files. The special value \
means to use the file name as the salt value.
But I'm not certain this will actually get you what you want. Assuming well formed log files, the event time is parsed from the entries within the log files themselves, therefore pursuing the above when searching you'll wind up with duplicate log entries, from different sources, but at the same time. Yes _indextime
could be used to try to figure out which source came before what, but events in Splunk are searched for and stored in _time
order (so it could be really inefficient, especially as your log files get big!) Not to mention given that you're now just salting the file with the filename... If fileA is renamed to fileB, and back to fileA, you won't capture the rename back).
Instead could I propose implementing a file monitoring system such as inotify, and have that write logs as to file renames, and just index this as a separate source of data if you are interested in renames?
Yes, it does exactly what I need.
And yes, I already using _indextime
to do necessary tasks. Hint: these are not log files with events that I am indexing, hence _indextime
is the only time reference that I have and use.
Care needs to be exercised to configure crcSalt
before enabling (or populating) this data source - otherwise Splunk would unnecessarily re-index everything.
inotify
probably would be a cleaner solution to detect pure act of renaming but I also need to have access to the latest file contents. The drawback of it - it does not monitor recursively inside subfolders. So in my case I made Splunk to do better inotify
job.
Renaming back to the same filename is not a problem (for my case) because I'll still have access to the latest content of fileA (even after second or third rename).
@woodcock fixed his answer (dang the markdown somtimes)!
I like the suggestion of using inotify or a similar system, as it gets directly at what you are trying to monitor: the action of renaming a file.
inotify
does not do recursive monitoring and also - i want to avoid adding too many moving parts outside of Splunk.
The markdown is indeed being persnickety. The special value I'm trying to reference is the literal value that I mention to set, and is called out in the linked documentation as a special case.