In system/default/inputs.conf, I see a stanza like this ...
[monitor://$SPLUNK_HOME/var/log/splunk]
I don't see a file mask at the end of the path, so I assume that it is just going to index everything in the directory ... which does appear to be the case. The odd thing though is that the logs in that folder rotate when they reach a certain size and only keep 5 rotated logs and it looks like Splunk is indexing them as well since the monitor just says pull in everything from the log folder.
Is this expected behaviour? Indeed if I run this ...
| tstats count(_time) WHERE index=_internal source="\*metrics.log\*" by source
I see entries like this ...
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.1
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.2
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.3
/opt/splunkforwarder/var/log/splunk/metrics.log
/opt/splunkforwarder/var/log/splunk/metrics.log.1
/opt/splunkforwarder/var/log/splunk/metrics.log.2
etc ...
So after talking about this with tech support and looking at my own system a bunch I have figured out what is going on. Splunk is designed such that is will not re-index rolled over logs. Even though the file name changes it realizes that the log file has already been indexed. This document talks about how Splunk handles log file rotation and doesn't re-index the data. That is why it is ok for them to say
[monitor://$SPLUNK_HOME/var/log/splunk]
instead of
[monitor://$SPLUNK_HOME/var/log/splunk/*.log]
In fact the first one is right and here is why. Log file rotation. What I mean by that is that when the metrics.log file rotates Splunk has not had a chance to get the tail of that log indexed. So, when metrics.log is renamed to metrics.log.1 Splunk looks at the file and realizes that is has already indexed much of that file, but there is a bit at the end that has not been index. So, it indexes that part and since the file name is not metrics.log.1 and not metrics.log that is what is reflected in the source. If the monitor stanza was set to only pull in *.log then the tail portion that was not fully indexed would never get pulled in as metrics.log.1 would not match the file mask *.log.
I verified this by looking at some of the data that was indexed in metrics.log.1 and found no corresponding entries in a metrics.log source. So, it isn't duplicating the data and the world is still turning just fine.
So after talking about this with tech support and looking at my own system a bunch I have figured out what is going on. Splunk is designed such that is will not re-index rolled over logs. Even though the file name changes it realizes that the log file has already been indexed. This document talks about how Splunk handles log file rotation and doesn't re-index the data. That is why it is ok for them to say
[monitor://$SPLUNK_HOME/var/log/splunk]
instead of
[monitor://$SPLUNK_HOME/var/log/splunk/*.log]
In fact the first one is right and here is why. Log file rotation. What I mean by that is that when the metrics.log file rotates Splunk has not had a chance to get the tail of that log indexed. So, when metrics.log is renamed to metrics.log.1 Splunk looks at the file and realizes that is has already indexed much of that file, but there is a bit at the end that has not been index. So, it indexes that part and since the file name is not metrics.log.1 and not metrics.log that is what is reflected in the source. If the monitor stanza was set to only pull in *.log then the tail portion that was not fully indexed would never get pulled in as metrics.log.1 would not match the file mask *.log.
I verified this by looking at some of the data that was indexed in metrics.log.1 and found no corresponding entries in a metrics.log source. So, it isn't duplicating the data and the world is still turning just fine.
Yes, that is expected behavior based on the input definition, but is not optimal since it is duplicating data. I suggest changing the input (in system/local/inputs.conf) to [monitor://$SPLUNK_HOME/var/log/splunk/*.log]
.
Seems like a config oversight to me. Our forwarders are putting tons of data into the _internal index. Most of the data in that index comes from the metrics.log files. Seems to me they could save a LOT of space in the _internal index by not pulling the roll over files by default.
Consider filing a bug report.
I already did. Thanks. 🙂