Deployment Architecture

Restarting HOST(server) without stopping splunkd service

earakam
Path Finder

Hi ,

I was wondering, whether there will be any effect on Splunk when the host is restarted suddenly while splunk service is still running.
What kind of possibilities are there with this? Or shouldn't i worry at all?
I couldn't find any documents about this issue so i would appreciate it if anyone could tell me the link to the docs aswell, if it exists.

Thank you.

Tags (1)
0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

I have not seen any UF have an issue during an unexpected restart (e.g. power outage, crash or what have you), AS LONG AS the system itself recovers well enough.

Given their purpose and what they do, a Universal Forwarder should be robust. In a situation like you describe only processes that are writing something to disk (and something significant, to be honest) should ever have a problem. The UF doesn't really write much of anything important to disk, so can't be interrupted.

I understand this is not canon, but it makes sense. Of course, the more times you pull the plug from under a running system the higher the likelihood that the OS itself will not come back up. I would therefore heartily recommend doing your best to limit such unintended problems!

As two anecdotal pieces of evidence...

We accidentally tested this two weeks ago, with several dozen UFs (among a few other systems) getting their disks pulled from under them when one of our older SANs went offline. In all those cases, which consisted of mostly Microsoft Windows Server 2012 with some Server 2008R2 and a handful of *nix, the OS came back fine and the UF came back fine with no issues, picking up right where it left off. In one case I had an old ext2 filesystem have minor issues after coming back, but even so - an fsck and a bit of tidying up and once the OS was fine the UF started pushing data just like it had.

And, we seem to have one VMware host per year provide us with an unexpected test of High Availability. In other words, we have one die about once per year taking down 50-100 Virtual machines hard (the guests see this as a power loss event, unlike the previous example where the guest was still running - sort of - but just lost all its hard drives). They restart on other hosts in a few moments and I've never seen a UF not work fine afterwards.

View solution in original post

Richfez
SplunkTrust
SplunkTrust

I have not seen any UF have an issue during an unexpected restart (e.g. power outage, crash or what have you), AS LONG AS the system itself recovers well enough.

Given their purpose and what they do, a Universal Forwarder should be robust. In a situation like you describe only processes that are writing something to disk (and something significant, to be honest) should ever have a problem. The UF doesn't really write much of anything important to disk, so can't be interrupted.

I understand this is not canon, but it makes sense. Of course, the more times you pull the plug from under a running system the higher the likelihood that the OS itself will not come back up. I would therefore heartily recommend doing your best to limit such unintended problems!

As two anecdotal pieces of evidence...

We accidentally tested this two weeks ago, with several dozen UFs (among a few other systems) getting their disks pulled from under them when one of our older SANs went offline. In all those cases, which consisted of mostly Microsoft Windows Server 2012 with some Server 2008R2 and a handful of *nix, the OS came back fine and the UF came back fine with no issues, picking up right where it left off. In one case I had an old ext2 filesystem have minor issues after coming back, but even so - an fsck and a bit of tidying up and once the OS was fine the UF started pushing data just like it had.

And, we seem to have one VMware host per year provide us with an unexpected test of High Availability. In other words, we have one die about once per year taking down 50-100 Virtual machines hard (the guests see this as a power loss event, unlike the previous example where the guest was still running - sort of - but just lost all its hard drives). They restart on other hosts in a few moments and I've never seen a UF not work fine afterwards.

earakam
Path Finder

hi rick7177!
thanks you for the detailed response.
This is very useful information.

Thank you!

0 Karma

earakam
Path Finder

sorry additional information.
By splunk, i meant splunk universal forwarder.

thanks.

0 Karma

MuS
SplunkTrust
SplunkTrust

There shouldn't be any problem since the UF is only reading logs. As well the UF will pick up reading any logs file from the last know position in the logs.

cheers, MuS

0 Karma

earakam
Path Finder

understood...thanks for the response!

0 Karma

masonmorales
Influencer

Windows or Linux? And is it a full version of Splunk or the Universal Forwarder?

0 Karma

earakam
Path Finder

Thanks for the response!
it's a Linux and Universal forwarder!

0 Karma
Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...