I have been asked to come up with a dashboard for my management team. I am trying to pull it from some Nagios performance stats. The data has an icmp poll against every network device on the network, every 5 minutes. The data looks like this:
June 4 00:00:00 host_name = switch1 loss=0%
June 4 00:00:00 host_name = switch2 loss=100%
June 4 00:05:00 host_name = switch1 loss=0%
June 4 00:05:00 host_name = switch2 loss=0%
I created the following search
| eval ping_up=if(loss!="100%", 100,0)
| stats avg(ping_up_ as uptime
| eval uptime=round(uptime,2)
| eval uptime = uptime
| "%"
First of all, this doesn't seem very efficient. Second of all, now they are asking for a monthly trend over the past 2 years as well as a real-time dashboard (ie current uptime is X%). I can't seem to find a way to do these without a huge hit to the system.
For your "real-time" dashboard, you can use this . Change earliest to your liking. Display this as a single value and set the formatting there.
index=logs earliest=1h@h | convert num(loss) as uptime | stats avg(uptime) as uptime
For the monthly trend
index=logs earliest=2y@y | convert num(loss) as uptime | timechart span=1mon avg(uptime) as uptime | eval uptime=tostring(uptime, "commas")."%"
To make this more efficient, you can save this as an accelerated report OR create a summary index and use that.
For your "real-time" dashboard, you can use this . Change earliest to your liking. Display this as a single value and set the formatting there.
index=logs earliest=1h@h | convert num(loss) as uptime | stats avg(uptime) as uptime
For the monthly trend
index=logs earliest=2y@y | convert num(loss) as uptime | timechart span=1mon avg(uptime) as uptime | eval uptime=tostring(uptime, "commas")."%"
To make this more efficient, you can save this as an accelerated report OR create a summary index and use that.
I tried changing it to:
index=logs earliest=1h@h | convert num(loss) as uptime| eval uptime=(100-uptime) | stats avg(uptime) as uptime
That seems to work
Either that, or in your rename the fields to downtime 🙂
Thanks. This looks much better, except that loss is actually packet loss (0% loss is good). This search gives me 0% uptime when I should be getting 100%. Any suggestions on reversing it?