Solved: How to create a dashboard on uptime reporting for ...

mmacdonald70 · ‎06-06-2016

I have been asked to come up with a dashboard for my management team. I am trying to pull it from some Nagios performance stats. The data has an icmp poll against every network device on the network, every 5 minutes. The data looks like this:

June 4 00:00:00 host_name = switch1 loss=0%
June 4 00:00:00 host_name = switch2 loss=100%
June 4 00:05:00 host_name = switch1 loss=0%
June 4 00:05:00 host_name = switch2 loss=0%

I created the following search

| eval ping_up=if(loss!="100%", 100,0) 
| stats avg(ping_up_ as uptime 
| eval uptime=round(uptime,2) 
| eval uptime = uptime 
| "%"

First of all, this doesn't seem very efficient. Second of all, now they are asking for a monthly trend over the past 2 years as well as a real-time dashboard (ie current uptime is X%). I can't seem to find a way to do these without a huge hit to the system.

sundareshr · ‎06-06-2016

For your "real-time" dashboard, you can use this . Change earliest to your liking. Display this as a single value and set the formatting there.

index=logs earliest=1h@h | convert num(loss) as uptime | stats avg(uptime) as uptime

For the monthly trend

index=logs earliest=2y@y | convert num(loss) as uptime | timechart span=1mon avg(uptime) as uptime | eval uptime=tostring(uptime, "commas")."%"

To make this more efficient, you can save this as an accelerated report OR create a summary index and use that.

View solution in original post

sundareshr · ‎06-06-2016

For your "real-time" dashboard, you can use this . Change earliest to your liking. Display this as a single value and set the formatting there.

index=logs earliest=1h@h | convert num(loss) as uptime | stats avg(uptime) as uptime

For the monthly trend

index=logs earliest=2y@y | convert num(loss) as uptime | timechart span=1mon avg(uptime) as uptime | eval uptime=tostring(uptime, "commas")."%"

To make this more efficient, you can save this as an accelerated report OR create a summary index and use that.

mmacdonald70 · ‎06-06-2016

I tried changing it to:

 index=logs earliest=1h@h | convert num(loss) as uptime| eval uptime=(100-uptime) | stats avg(uptime) as uptime

That seems to work

sundareshr · ‎06-06-2016

Either that, or in your rename the fields to downtime 🙂

mmacdonald70 · ‎06-06-2016

Thanks. This looks much better, except that loss is actually packet loss (0% loss is good). This search gives me 0% uptime when I should be getting 100%. Any suggestions on reversing it?

How to create a dashboard on uptime reporting for management?

Buttercup Games: Further Dashboarding Techniques

Message Parsing in SOCK

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery