Splunk Search

Identifying inactive Hosts that crashed

legosawyer
Engager

I'm trying to identify inactive hosts that crashed (through an alert).

Inactive hosts - hosts that haven't logged in the past 1hr
host that didn't crash- logs a message like this ".* Gracefully Exited"
host that did crash- never logs a message like the one above ^ and eventually becomes inactive


For inactive hosts, I've found this search to be useful. It searches the past 2 hours for host that haven't logged within the last hour:

| tstats latest(_time) as latest where index=a sourcetype=b source = c earliest=-2h by host
| eval logged_within_past_hour = if(latest > relative_time(now(),"-1h"),1,0), time_of_host_last_log = strftime(latest,"%c") | where logged_within_past_hour=0

I'm able to use this splunk search to find logs where the host terminated.

index=a sourcetype=b Gracefully Exited

Is there a way to find hosts that crashed and have became inactive? I don't want to include the hosts that terminated successfully and didn't crash







Labels (3)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Check out the Track Me and Meta Woot! apps.  They do that kind of thing for you.

If you want to do it yourself, be aware that finding something that is not there is not Splunk's strong suit.    You'll need a list of expected hosts to compare against those seen recently.  See this blog entry for a good write-up on it.

https://www.duanewaddle.com/proving-a-negative/

---
If this reply helps you, Karma would be appreciated.

legosawyer
Engager

Hm, similar to that post, would I be able to do this kind of set manipulation (can also use table with count > 1 to make a set)?


Take the Hosts that have logged from [-2hr ago, now]
- Hosts that print the graceful exit message [-2hr ago, now]      (excluding graceful exits)
------------------------------------------------------------------------------------------------------------------------------
Now we're left with running and crashed Hosts that have logged from [-2hr ago, now]

- Hosts that have logged from [-1hr ago, now].                                 (excluding running and crashed hosts within [1hr ago, now])

Now we're left with still-running* and crashed Hosts that have ONLY logged from [2hr ago, 1hr ago]. In this case the "still -running ones" haven't logged from [1hr ago, now]. I'm going to declare them crashed.

Does this work out? I'm unsure how to implement this

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think you need a couple of subsearches. Something like this:

<<Hosts that have logged>> earliest=-2h latest=now NOT [ <<Hosts that print the graceful exit message>> earliest=-2h latest=now | fields host | format ] NOT [ <<Hosts that have logged>> earliest=-1h latest=now | fields host | format ]

Be sure to test each subsearch separately to make sure they return the expected results.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...