Getting Data In

What is recommended to get a Splunk universal forwarder to pick up data after a SQL server cluster failover?

sat94541
Communicator

Customer has many SQL Server clusters that are using Windows Failover Clustering.
Splunk is installed at the node-level (so if there are 4 physical servers, Splunk is installed 4 times).

When an instance "fails over" from one node to another (say from node 1 to node 2), the Splunk agent on node 2 does not start picking up the log file.
A simple restart of the forwarding agent on node 2 "fixes" this problem.

We suspect that cause of this is the fact that the SQL Failover groups have a drive letter that moves from one host to another on failover. For example, when the SQL failover group is on node 1, the "E:\" drive will be physically mounted on node 1. When it "fails over" to node 2, the "E:\" drive is pulled away from node 1 and mounted on node 2. The Splunk forwarder on node 2, while configured to look on the E:\ drive has already "excluded" this search path sine the drive didn't exist on startup.

This is why restarting the forwarder allows it to "find" the E:\ drive and grab the log files.

What sort of solution does Splunk recommend we do for this?
Is there a Splunk configuration setting where we can have the forwarder continue to look for log files, even if the drive isn't there? Do we need to auto-restart the agent on failover?

0 Karma
1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

Get Updates on the Splunk Community!

How I Instrumented a Rust Application Without Knowing Rust

As a technical writer, I often have to edit or create code snippets for Splunk's distributions of ...

Splunk Community Platform Survey

Hey Splunk Community, Starting today, the community platform may prompt you to participate in a survey. The ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...