Archive

Indexing historical (rotated) logs

Builder

Hi,

When indexing from scratch, what is the recommended way to deal with historical data, ie. logs that have been previously rotated. One option would be to monitor all these. This solution is fairly simple to implement, since both the log file and the rotated log files can be added with a single stanza in inputs.conf. However, there is no real need to monitor the rotated log files once they have been indexed. So the other option is apparently to use spool or oneshot. Which is wiser? Is there any real motivation to monitor rotated files?

Also, does spool actually have any advantage over oneshot? It seems that spool causes the source to show $SPLUNK_HOME/var/spool, which isn't ideal. I realise I can override this with -rename-source.

Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi echalex

just monitor your log directory with a monitor stanza. Splunk will take care about which files must be indexed and which ones already are indexed.

regarding spool the docs say:

Copy the file into Splunk via the sinkhole directory. This command is similar to add oneshot, except that the file gets spooled from the sinkhole directory, rather than added immediately.

kind regards

View solution in original post

Legend

It is true that Splunk will know which of your rotated files have already been indexed. Splunk isn't going to reindex "audit.log" when it rolls to "audit.log.1" -- that's good.

However, Splunk will continue to monitor "audit.log.1" -- just in case something new gets added to it! Of course, nothing should ever be added to this file now. Once you have thousands of files in a directory, it consumes some resources just for Splunk to poll each file to see if it has been changed.

As a best practice, you should move older, inactive log files to another directory or archive them offline. This will allow Splunk to monitor fewer files and poll them more quickly, decreasing the time between when an event occurs and when Splunk detects/indexes the event. This becomes increasingly important as the number of the files in the directory grows.

This advice only applies if you are asking Splunk to monitor a directory. If your stanzas in inputs.conf refer to individual files, then this will not be an issue in your environment.

SplunkTrust
SplunkTrust

Hi echalex

just monitor your log directory with a monitor stanza. Splunk will take care about which files must be indexed and which ones already are indexed.

regarding spool the docs say:

Copy the file into Splunk via the sinkhole directory. This command is similar to add oneshot, except that the file gets spooled from the sinkhole directory, rather than added immediately.

kind regards

View solution in original post

Legend

echalex, you should worry about the huge number of watched files...

Builder

Regarding the spool/oneshot question, I had read the documentation, but I do not see the benefit directly. One thing I noticed, that "add oneshot" won't take wild cards.

0 Karma

Builder

Thanks MuS!
That's basically how we're doing it right now, but I was simply unsure whether this was the recommended way. The solution does require configuring whitelists and blacklists and setting the sourcetypes in props.conf.

The setup seems to work, but I was slightly concerned about the lines in splunkd.log mentioning that the CRC has already been seen in conjunction with rotated files. However, I do not seem to need a CRC salt.

What also concerns me is the huge number of watched files this results in, even though ignored with white/blacklist. "splunk list monitor" outputs 6329 lines.

0 Karma