Splunk Search

Identifying inactive Hosts that crashed

legosawyer
Engager

I'm trying to identify inactive hosts that crashed (through an alert).

Inactive hosts - hosts that haven't logged in the past 1hr
host that didn't crash- logs a message like this ".* Gracefully Exited"
host that did crash- never logs a message like the one above ^ and eventually becomes inactive


For inactive hosts, I've found this search to be useful. It searches the past 2 hours for host that haven't logged within the last hour:

| tstats latest(_time) as latest where index=a sourcetype=b source = c earliest=-2h by host
| eval logged_within_past_hour = if(latest > relative_time(now(),"-1h"),1,0), time_of_host_last_log = strftime(latest,"%c") | where logged_within_past_hour=0

I'm able to use this splunk search to find logs where the host terminated.

index=a sourcetype=b Gracefully Exited

Is there a way to find hosts that crashed and have became inactive? I don't want to include the hosts that terminated successfully and didn't crash







Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Check out the Track Me and Meta Woot! apps.  They do that kind of thing for you.

If you want to do it yourself, be aware that finding something that is not there is not Splunk's strong suit.    You'll need a list of expected hosts to compare against those seen recently.  See this blog entry for a good write-up on it.

https://www.duanewaddle.com/proving-a-negative/

---
If this reply helps you, Karma would be appreciated.

legosawyer
Engager

Hm, similar to that post, would I be able to do this kind of set manipulation (can also use table with count > 1 to make a set)?


Take the Hosts that have logged from [-2hr ago, now]
- Hosts that print the graceful exit message [-2hr ago, now]      (excluding graceful exits)
------------------------------------------------------------------------------------------------------------------------------
Now we're left with running and crashed Hosts that have logged from [-2hr ago, now]

- Hosts that have logged from [-1hr ago, now].                                 (excluding running and crashed hosts within [1hr ago, now])

Now we're left with still-running* and crashed Hosts that have ONLY logged from [2hr ago, 1hr ago]. In this case the "still -running ones" haven't logged from [1hr ago, now]. I'm going to declare them crashed.

Does this work out? I'm unsure how to implement this

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think you need a couple of subsearches. Something like this:

<<Hosts that have logged>> earliest=-2h latest=now NOT [ <<Hosts that print the graceful exit message>> earliest=-2h latest=now | fields host | format ] NOT [ <<Hosts that have logged>> earliest=-1h latest=now | fields host | format ]

Be sure to test each subsearch separately to make sure they return the expected results.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...