Splunk Search

Splunk query to determine how long a Splunk instance was down in past?

Path Finder

Hi,
We need to provide report, where we need to capture how long Splunk instance was down in past.
Is it possible to capture using internal logs? What Splunk query can we use to get the duration?

Note: Currently Splunk instances are up and running.

0 Karma

Esteemed Legend

By default, Splunk only stores the _* logs for 30 days so if you need to go farther back than that, you can infer an outage by looking for a large jump in latency as defined as _time subtracted from_indextime.

0 Karma

Motivator

Splink writes every 10 seconds in the resource_usage.log. With this query you can find gaps in the logging which can indicate when the splunk process was down. This is only an estimation, you have to add/substract the time splunk need to start/shutdown.

index=_introspection sourcetype=splunk_resource_usage

If other splunk instances send internal logs to the indexing layer (always the best practice) then you can find the "downtime" for other splunk instances by specifying the host:

index=_introspection sourcetype=splunk_resource_usage host=XXX
0 Karma

SplunkTrust
SplunkTrust

You can search the splunkd.log files for "Shutdown complete" and "Splunkd starting" then calculate the difference between those events.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

I just see "splunkd starting" but there are no events for "shutdown complete", what does that mean?

0 Karma

SplunkTrust
SplunkTrust

I've seen that when Splunk is restarted from the GUI. In that case, there is no indication of the restart. I don't have a solution for that case.

---
If this reply helps you, an upvote would be appreciated.
0 Karma