Getting Data In

What is recommended to get a Splunk universal forwarder to pick up data after a SQL server cluster failover?

sat94541
Communicator

Customer has many SQL Server clusters that are using Windows Failover Clustering.
Splunk is installed at the node-level (so if there are 4 physical servers, Splunk is installed 4 times).

When an instance "fails over" from one node to another (say from node 1 to node 2), the Splunk agent on node 2 does not start picking up the log file.
A simple restart of the forwarding agent on node 2 "fixes" this problem.

We suspect that cause of this is the fact that the SQL Failover groups have a drive letter that moves from one host to another on failover. For example, when the SQL failover group is on node 1, the "E:\" drive will be physically mounted on node 1. When it "fails over" to node 2, the "E:\" drive is pulled away from node 1 and mounted on node 2. The Splunk forwarder on node 2, while configured to look on the E:\ drive has already "excluded" this search path sine the drive didn't exist on startup.

This is why restarting the forwarder allows it to "find" the E:\ drive and grab the log files.

What sort of solution does Splunk recommend we do for this?
Is there a Splunk configuration setting where we can have the forwarder continue to look for log files, even if the drive isn't there? Do we need to auto-restart the agent on failover?

0 Karma
1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

View solution in original post

rbal_splunk
Splunk Employee
Splunk Employee

Splunk isn't cluster-aware (MSCS, Microsoft cluster services), so the Splunk Universal Forwarder is behaving as expected.
The cluster admins are aware of these types of issues (this isn't a new or unique problem at all) and will work around them by adding a component to the cluster failover process.

Reference this article from 2009 about a "generic script resource": https://blogs.msdn.microsoft.com/clustering/2009/09/28/creating-and-configuring-a-generic-script-res...

So cluster admin should script the service restart.

Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...