Splunk Search

Splunk Stability???

kmcconnell
Path Finder

It seems we are having several issues with our Splunk servers/architecture and I wanted to know if anyone else has had issues. If so, were you able to get them fixed? To give you an idea of our layout, we have two indexers and two search heads (all four are big physical boxes, Windows OS). We have 10 heavy forwarders spread round the world (these are virtual boxes, Windows OS and 1 Linux). We also have several hundred universal forwarders (Windows OS) sending data to the heavy forwarders for filtering. We are now on Splunk 6.0.x, but that hasn’t helped and I’m wondered if it might have hurt things. The kinds of issues we are having are:

  • Indexers stop receiving data (sometimes both – very bad!)
  • Universal forwarders loses connection to heavy forwarders (restarting Splunk on the heavy forwarders fixes the issues, but we haven’t found why the connection is dropped)
  • Sometimes the universal forwarders stop sending data (again, restarting Splunk fixes the problem)
  • There are times when the Splunk service is “running”, but Splunk is not actually running.

I’d just like to know if anyone else is having stability issues besides us. I thought that Splunk was supposed to be one of those rock solid applications that just runs, but we haven’t seen that. Maybe if it was running on Linux, but we don’t have that option.

Tags (1)

thesteve
Path Finder

I haven't experienced this particular problem, but I have had a similar one in the past that took weeks to track down. (if only we had splunk back then... )

We had a firewall in place that would, after a period of inactivity, stop forwarding data on a given open port. To the client it looked like it was sending data, on the server it looked like no data was being sent. It was only when we correlated the lost connection with a 30 minute inactivity period that we were able to figure things out. We had assumed that a firewall issue would mean a blocked port, not a non-forwarded port.

The final solution for us was to use TCP Keep-alives configured at the OS level. As a temporary solution, we wrote a small script that generated a small amount of activity every 29 minutes.

I'm not saying this is your problem, but it's worth spending a few minutes looking into.

0 Karma

grijhwani
Motivator

Sounds like either, yes, Windows is being its usual flaky self (he says with admitted prejudice), or you have a recently introduced network architecture or infrastructure problem.

0 Karma

kmcconnell
Path Finder

Yes. Sometimes its a connection failure, other times it shows nothing in the logs. I mainly wanted to see if anyone else was having issues with their Splunk instance.

0 Karma

lukejadamec
Super Champion

Have you checked for errors in the various splunkd logs?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...