Deployment Architecture

Server availability reports

Path Finder

Hello,

I need to prepare a server availability chart depicting "uptime / downtime" represented on a line chart.

The monitoring tool always begins counting the uptime from the 1st observed host state as UP ( As seen in the data snippet below "CURRENT HOST STATE .....") As soon as the monitoring tool detects server is not reachable it changes its state to DOWN;SOFT it logs a HOST ALERT classification with a time stamp. It does a few retries to see if the host has recovered and once the check interval lapses and the host is still detected as unreachable it changes the state of the host to DOWN;HARD. It stays there until the monitoring detects the host is available again.
I need to prepare a line chart showing the time duration the server was up and the time duration the server was down. Any help in achieving this is highly appreciated.

My data looks like this :

time,"c_time",classification,"host_name","host_state","host_message"
1505278803,"09/13/2017 07:00:03","CURRENT HOST STATE","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.499ms"
1505296351,"09/13/2017 11:52:31","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505296368,"09/13/2017 11:52:48","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.486ms"
1505299437,"09/13/2017 12:43:57","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505299460,"09/13/2017 12:44:20","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505299608,"09/13/2017 12:46:48","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.563ms"
1505308266,"09/13/2017 15:11:06","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505308282,"09/13/2017 15:11:22","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.671ms"
1505310169,"09/13/2017 15:42:49","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310194,"09/13/2017 15:43:14","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.537ms"
1505310474,"09/13/2017 15:47:54","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310507,"09/13/2017 15:48:27","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310550,"09/13/2017 15:49:10","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.729ms"
1505313807,"09/13/2017 16:43:27","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505313820,"09/13/2017 16:43:40","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.686ms"
1505317401,"09/13/2017 17:43:21","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505317446,"09/13/2017 17:44:06","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505317813,"09/13/2017 17:50:13","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.579ms"
1505328210,"09/13/2017 20:43:30","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505328278,"09/13/2017 20:44:38","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505328345,"09/13/2017 20:45:45","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.523ms"
1505331558,"09/13/2017 21:39:18","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505331621,"09/13/2017 21:40:21","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 1357.259ms"

0 Karma

New Member

I don't know what you mean by "line 4 in the rex , after the first? mark and before the . write". Please elaborate.

0 Karma

Champion

well, i don't know what exactly you mean by - 'I need to prepare a line chart showing the time duration the server was up and the time duration the server was down'
you do realize that downtime is very very small compared uptime and having both on same time axis makes the graph looks very ugly, anyway here is the query :

| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(st) as prevst,last(t) as prevt
| eval downtime=if((st="UP" AND(prevst="DOWN")) OR (st="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

===
I recommend using the multiseries chart mode with Y axis independent. The stats column from the output will give you what you are looking for. uptime and downtime are calculated in minutes in the above query.

0 Karma

Champion

For some reason the rex did not get copied properly, use the below one instead.

| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(st) as prevst,last(t) as prevt
| eval downtime=if((st="UP" AND(prevst="DOWN")) OR (st="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

0 Karma

Champion

| eval host_message=m1+" " +m2
| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(status) as prevst,last(t) as prevt
| eval downtime=if((status="UP" AND(prevst="DOWN")) OR (status="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

0 Karma

Champion

hmm some issue with the pasting - line 4 in the rex , after the first? mark and before the . write

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!