Getting Data In

What is recommended to get a Splunk universal forwarder to pick up data after a SQL server cluster failover?

sat94541
Communicator

Customer has many SQL Server clusters that are using Windows Failover Clustering.
Splunk is installed at the node-level (so if there are 4 physical servers, Splunk is installed 4 times).

When an instance "fails over" from one node to another (say from node 1 to node 2), the Splunk agent on node 2 does not start picking up the log file.
A simple restart of the forwarding agent on node 2 "fixes" this problem.

We suspect that cause of this is the fact that the SQL Failover groups have a drive letter that moves from one host to another on failover. For example, when the SQL failover group is on node 1, the "E:\" drive will be physically mounted on node 1. When it "fails over" to node 2, the "E:\" drive is pulled away from node 1 and mounted on node 2. The Splunk forwarder on node 2, while configured to look on the E:\ drive has already "excluded" this search path sine the drive didn't exist on startup.

This is why restarting the forwarder allows it to "find" the E:\ drive and grab the log files.

What sort of solution does Splunk recommend we do for this?
Is there a Splunk configuration setting where we can have the forwarder continue to look for log files, even if the drive isn't there? Do we need to auto-restart the agent on failover?

0 Karma
1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...