Solved: Re: After a Disaster Recovery switchover, why did ...

alekksi · ‎02-09-2015

Hi all,

Recently we performed a Disaster Recovery switchover. It was found out after the switchover that none of the logs on the DR site were being pulled into Splunk, despite having the same config and file locations as the PROD site.

We were able to sort this by restarting the universal forwarders on all of these hosts. This is not too much of a problem as it can be automated, but we would like to understand what could be the underlying cause to this.

A few background pieces of information:
- The servers were restarted 4-5 days before the DR switchover and therefore the forwarders were stopped and started
- Data was being sent from these hosts consistently ('nix app, custom performance scripts, etc.)
- Configuration and setup is the same on both environments
- Restarts fixed the issues with no further action being taken
- Splunkforwarder version is 6.1.1

Any insight into this issue would be greatly appreciated!

Thanks,
Alex

harsmarvania57 · ‎06-04-2015

Are you using "ignoreOlderThan" in monitor stanza in inputs.conf?

View solution in original post

harsmarvania57 · ‎06-04-2015

Are you using "ignoreOlderThan" in monitor stanza in inputs.conf?

alekksi · ‎06-10-2015

Unfortunately I was 😞

alekksi · ‎03-03-2015

Does no one have any insight into why Splunk may stop monitoring files? This is not just relating to DR -- it seems to do it intermittently, especially with certain hosts.

srioux · ‎03-03-2015

Don't suppose there'd be anything in the internal logs ($SPLUNK_HOME/var/log/splunk/splunkd.log) to help indicate any issues?

There's a few ideas, but not sure it'd be any of them:

Forwarder was set up with local configurations, but wasn't restarted for it take effect.
Forwarder set up to get configs from deploy server, without the restart splunkd stanza (so configs may not take effect).
May be hitting a particular documented bug (see release notes, ex: http://docs.splunk.com/Documentation/Splunk/6.1.6/ReleaseNotes/6.1.3)
Disk partition issues (if disk is full?)

alekksi · ‎03-04-2015

Thanks for the response.

In response to the bullet points:

Forwarders are all set up using configs from the deployment server. All of these forwarders have been restarted relatively recently (for the ones that are causing the biggest headaches, we restart them once a week)
Can't find any documented bug
We have independent monitoring which alerts the support teams upon disk issues, so this is within bounds

The main issue we seem to get is that for files that aren't written to as often, they can stop being monitored by the forwarders from time to time. This is particularly true of one application, which logs infrequently, but when it logs, the information is rather vital. On the same forwarder, it will have stopped monitoring one log file, but is still monitoring the others in its inputs.conf (same custom Splunk collector, in the same input stanza).

There's nothing in the splunkd.log, nor anything I can see in the metrics either. It simply stops monitoring the log and stops reporting that it's monitoring the log. As with the case in the original post, if it gets to this point, it simply stops monitoring for new files.

After a Disaster Recovery switchover, why did Linux universal forwarders running Splunk 6.1.1 not pick up new logs in the monitored directory?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Casting Call: Compete in Cyber Games

How Edge Processor's Durable Queue Works

Join the Conversation