Monitoring Splunk

Can you help me with some questions I have about the metrics.log and _internal index?

dstuder
Communicator

In system/default/inputs.conf, I see a stanza like this ...

[monitor://$SPLUNK_HOME/var/log/splunk]

I don't see a file mask at the end of the path, so I assume that it is just going to index everything in the directory ... which does appear to be the case. The odd thing though is that the logs in that folder rotate when they reach a certain size and only keep 5 rotated logs and it looks like Splunk is indexing them as well since the monitor just says pull in everything from the log folder.

Is this expected behaviour? Indeed if I run this ...

| tstats count(_time) WHERE index=_internal source="\*metrics.log\*" by source

I see entries like this ...

C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.1
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.2
C:\Program Files\SplunkUniversalForwarder\var\log\splunk\metrics.log.3
/opt/splunkforwarder/var/log/splunk/metrics.log
/opt/splunkforwarder/var/log/splunk/metrics.log.1
/opt/splunkforwarder/var/log/splunk/metrics.log.2
etc ...
1 Solution

dstuder
Communicator

So after talking about this with tech support and looking at my own system a bunch I have figured out what is going on. Splunk is designed such that is will not re-index rolled over logs. Even though the file name changes it realizes that the log file has already been indexed. This document talks about how Splunk handles log file rotation and doesn't re-index the data. That is why it is ok for them to say

[monitor://$SPLUNK_HOME/var/log/splunk]

instead of

[monitor://$SPLUNK_HOME/var/log/splunk/*.log]

In fact the first one is right and here is why. Log file rotation. What I mean by that is that when the metrics.log file rotates Splunk has not had a chance to get the tail of that log indexed. So, when metrics.log is renamed to metrics.log.1 Splunk looks at the file and realizes that is has already indexed much of that file, but there is a bit at the end that has not been index. So, it indexes that part and since the file name is not metrics.log.1 and not metrics.log that is what is reflected in the source. If the monitor stanza was set to only pull in *.log then the tail portion that was not fully indexed would never get pulled in as metrics.log.1 would not match the file mask *.log.

I verified this by looking at some of the data that was indexed in metrics.log.1 and found no corresponding entries in a metrics.log source. So, it isn't duplicating the data and the world is still turning just fine.

View solution in original post

0 Karma

dstuder
Communicator

So after talking about this with tech support and looking at my own system a bunch I have figured out what is going on. Splunk is designed such that is will not re-index rolled over logs. Even though the file name changes it realizes that the log file has already been indexed. This document talks about how Splunk handles log file rotation and doesn't re-index the data. That is why it is ok for them to say

[monitor://$SPLUNK_HOME/var/log/splunk]

instead of

[monitor://$SPLUNK_HOME/var/log/splunk/*.log]

In fact the first one is right and here is why. Log file rotation. What I mean by that is that when the metrics.log file rotates Splunk has not had a chance to get the tail of that log indexed. So, when metrics.log is renamed to metrics.log.1 Splunk looks at the file and realizes that is has already indexed much of that file, but there is a bit at the end that has not been index. So, it indexes that part and since the file name is not metrics.log.1 and not metrics.log that is what is reflected in the source. If the monitor stanza was set to only pull in *.log then the tail portion that was not fully indexed would never get pulled in as metrics.log.1 would not match the file mask *.log.

I verified this by looking at some of the data that was indexed in metrics.log.1 and found no corresponding entries in a metrics.log source. So, it isn't duplicating the data and the world is still turning just fine.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, that is expected behavior based on the input definition, but is not optimal since it is duplicating data. I suggest changing the input (in system/local/inputs.conf) to [monitor://$SPLUNK_HOME/var/log/splunk/*.log].

---
If this reply helps you, Karma would be appreciated.
0 Karma

dstuder
Communicator

Seems like a config oversight to me. Our forwarders are putting tons of data into the _internal index. Most of the data in that index comes from the metrics.log files. Seems to me they could save a LOT of space in the _internal index by not pulling the roll over files by default.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Consider filing a bug report.

---
If this reply helps you, Karma would be appreciated.
0 Karma

dstuder
Communicator

I already did. Thanks. 🙂

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...