All Apps and Add-ons

How to calc component availabilty data coming from nagios

auntyem
Explorer

I am trying to get component availability from nagios data coming into splunk for the previous month. I'm ultimately trying to get the % of time available. I've developed the following search to calculate down time for a single host (hardware names have been changed to protect the innocent):

index=nagios src_host=blahn1* HOST ALERT HARD NOT SERVICE* host="blah.umich.edu"|sort _time|delta _time AS duration p=1|where name="UP"|stats sum(duration) as total_down_time|table total_down_time

This appears to be working correctly. Then if I can calculate total up time (I'd like to eventually exclude maintenance windows but they aren't in nagios yet so can't), I can get my percentage.

My questions are as follows:

  • I'm sure i"m not the first person to do this. Is there a better way?
  • If not, I can't find an easy way to calculate the uptime for the month by looking at the timespan ofhte search. Again, I'm sure I'm missing something there.
  • My search above is for a single host (src_host) in nagios. I'd like to do this on a group of hosts, using tags to get the group. I've tried the following to do this but it isn't quite working, though it's close:

index=nagios tag=mxhosts HOST ALERT HARD NOT SERVICE* host="blah.umich.edu"|sort src_host, _time|delta _time AS duration p=1|delta src_host AS hostdiff|where name="UP" and ~something with hostdiff~|stats sum(duration) as total_down_time by src_host

Since I can use a 'by' in the delta command, I was trying use the hostdiff field created in the search as a way to filter out durations when the src_host changed (i.e. previous event was from a different host so 'delta' shouldn't count.

  • we have four nagios hosts monitoring everything. I'd like to just pick one, but if there's a netsplit that makes a host register an 'outage', I don't want that to be the one host. I somehow want the 'best' host at any given poitn in time. Any suggestions here?

Thanks

Tags (1)
0 Karma

lukeh
Contributor

Hi 🙂

Please upgrade to the latest release of Splunk for Nagios and let me know how you go 🙂

http://apps.splunk.com/app/352/

There are a number of new dashboards including:

Livestatus Host SLA

Livestatus Service SLA

All the best,

Luke 🙂

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...