All Apps and Add-ons

How to calc component availabilty data coming from nagios


I am trying to get component availability from nagios data coming into splunk for the previous month. I'm ultimately trying to get the % of time available. I've developed the following search to calculate down time for a single host (hardware names have been changed to protect the innocent):

index=nagios src_host=blahn1* HOST ALERT HARD NOT SERVICE* host=""|sort _time|delta _time AS duration p=1|where name="UP"|stats sum(duration) as total_down_time|table total_down_time

This appears to be working correctly. Then if I can calculate total up time (I'd like to eventually exclude maintenance windows but they aren't in nagios yet so can't), I can get my percentage.

My questions are as follows:

  • I'm sure i"m not the first person to do this. Is there a better way?
  • If not, I can't find an easy way to calculate the uptime for the month by looking at the timespan ofhte search. Again, I'm sure I'm missing something there.
  • My search above is for a single host (src_host) in nagios. I'd like to do this on a group of hosts, using tags to get the group. I've tried the following to do this but it isn't quite working, though it's close:

index=nagios tag=mxhosts HOST ALERT HARD NOT SERVICE* host=""|sort src_host, _time|delta _time AS duration p=1|delta src_host AS hostdiff|where name="UP" and ~something with hostdiff~|stats sum(duration) as total_down_time by src_host

Since I can use a 'by' in the delta command, I was trying use the hostdiff field created in the search as a way to filter out durations when the src_host changed (i.e. previous event was from a different host so 'delta' shouldn't count.

  • we have four nagios hosts monitoring everything. I'd like to just pick one, but if there's a netsplit that makes a host register an 'outage', I don't want that to be the one host. I somehow want the 'best' host at any given poitn in time. Any suggestions here?


Tags (1)
0 Karma


Hi 🙂

Please upgrade to the latest release of Splunk for Nagios and let me know how you go 🙂

There are a number of new dashboards including:

Livestatus Host SLA

Livestatus Service SLA

All the best,

Luke 🙂

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...