Getting Data In

What is recommended to get a Splunk universal forwarder to pick up data after a SQL server cluster failover?

sat94541
Communicator

Customer has many SQL Server clusters that are using Windows Failover Clustering.
Splunk is installed at the node-level (so if there are 4 physical servers, Splunk is installed 4 times).

When an instance "fails over" from one node to another (say from node 1 to node 2), the Splunk agent on node 2 does not start picking up the log file.
A simple restart of the forwarding agent on node 2 "fixes" this problem.

We suspect that cause of this is the fact that the SQL Failover groups have a drive letter that moves from one host to another on failover. For example, when the SQL failover group is on node 1, the "E:\" drive will be physically mounted on node 1. When it "fails over" to node 2, the "E:\" drive is pulled away from node 1 and mounted on node 2. The Splunk forwarder on node 2, while configured to look on the E:\ drive has already "excluded" this search path sine the drive didn't exist on startup.

This is why restarting the forwarder allows it to "find" the E:\ drive and grab the log files.

What sort of solution does Splunk recommend we do for this?
Is there a Splunk configuration setting where we can have the forwarder continue to look for log files, even if the drive isn't there? Do we need to auto-restart the agent on failover?

0 Karma
1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

Get Updates on the Splunk Community!

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...