Recently we performed a Disaster Recovery switchover. It was found out after the switchover that none of the logs on the DR site were being pulled into Splunk, despite having the same config and file locations as the PROD site.
We were able to sort this by restarting the universal forwarders on all of these hosts. This is not too much of a problem as it can be automated, but we would like to understand what could be the underlying cause to this.
A few background pieces of information:
- The servers were restarted 4-5 days before the DR switchover and therefore the forwarders were stopped and started
- Data was being sent from these hosts consistently ('nix app, custom performance scripts, etc.)
- Configuration and setup is the same on both environments
- Restarts fixed the issues with no further action being taken
- Splunkforwarder version is 6.1.1
Any insight into this issue would be greatly appreciated!
Does no one have any insight into why Splunk may stop monitoring files? This is not just relating to DR -- it seems to do it intermittently, especially with certain hosts.
Don't suppose there'd be anything in the internal logs ($SPLUNK_HOME/var/log/splunk/splunkd.log) to help indicate any issues?
There's a few ideas, but not sure it'd be any of them:
Thanks for the response.
In response to the bullet points:
The main issue we seem to get is that for files that aren't written to as often, they can stop being monitored by the forwarders from time to time. This is particularly true of one application, which logs infrequently, but when it logs, the information is rather vital. On the same forwarder, it will have stopped monitoring one log file, but is still monitoring the others in its inputs.conf (same custom Splunk collector, in the same input stanza).
There's nothing in the splunkd.log, nor anything I can see in the metrics either. It simply stops monitoring the log and stops reporting that it's monitoring the log. As with the case in the original post, if it gets to this point, it simply stops monitoring for new files.
Are you using "ignoreOlderThan" in monitor stanza in inputs.conf?
Unfortunately I was 😞