Server availability reports

saurabhkunte · ‎09-16-2017

Hello,

I need to prepare a server availability chart depicting "uptime / downtime" represented on a line chart.

The monitoring tool always begins counting the uptime from the 1st observed host state as UP ( As seen in the data snippet below "CURRENT HOST STATE .....") As soon as the monitoring tool detects server is not reachable it changes its state to DOWN;SOFT it logs a HOST ALERT classification with a time stamp. It does a few retries to see if the host has recovered and once the check interval lapses and the host is still detected as unreachable it changes the state of the host to DOWN;HARD. It stays there until the monitoring detects the host is available again.
I need to prepare a line chart showing the time duration the server was up and the time duration the server was down. Any help in achieving this is highly appreciated.

My data looks like this :

time,"c_time",classification,"host_name","host_state","host_message"
1505278803,"09/13/2017 07:00:03","CURRENT HOST STATE","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.499ms"
1505296351,"09/13/2017 11:52:31","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505296368,"09/13/2017 11:52:48","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.486ms"
1505299437,"09/13/2017 12:43:57","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505299460,"09/13/2017 12:44:20","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505299608,"09/13/2017 12:46:48","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.563ms"
1505308266,"09/13/2017 15:11:06","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505308282,"09/13/2017 15:11:22","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.671ms"
1505310169,"09/13/2017 15:42:49","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310194,"09/13/2017 15:43:14","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.537ms"
1505310474,"09/13/2017 15:47:54","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310507,"09/13/2017 15:48:27","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505310550,"09/13/2017 15:49:10","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.729ms"
1505313807,"09/13/2017 16:43:27","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505313820,"09/13/2017 16:43:40","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.686ms"
1505317401,"09/13/2017 17:43:21","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505317446,"09/13/2017 17:44:06","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505317813,"09/13/2017 17:50:13","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.579ms"
1505328210,"09/13/2017 20:43:30","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505328278,"09/13/2017 20:44:38","HOST ALERT","server1.contoso.com","DOWN;HARD","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505328345,"09/13/2017 20:45:45","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 45.523ms"
1505331558,"09/13/2017 21:39:18","HOST ALERT","server1.contoso.com","DOWN;SOFT","CRITICAL - 10.0.0.10: rta nan, lost 100%"
1505331621,"09/13/2017 21:40:21","HOST ALERT","server1.contoso.com","UP;HARD","OK - 10.0.0.10 responds to ICMP. Packet 1, rta 1357.259ms"

craigbowens · ‎07-30-2019

I don't know what you mean by "line 4 in the rex , after the first? mark and before the . write". Please elaborate.

Sukisen1981 · ‎09-16-2017

well, i don't know what exactly you mean by - 'I need to prepare a line chart showing the time duration the server was up and the time duration the server was down'
you do realize that downtime is very very small compared uptime and having both on same time axis makes the graph looks very ugly, anyway here is the query :

| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(st) as prevst,last(t) as prevt
| eval downtime=if((st="UP" AND(prevst="DOWN")) OR (st="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

===
I recommend using the multiseries chart mode with Y axis independent. The stats column from the output will give you what you are looking for. uptime and downtime are calculated in minutes in the above query.

Sukisen1981 · ‎09-16-2017

For some reason the rex did not get copied properly, use the below one instead.

| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(st) as prevst,last(t) as prevt
| eval downtime=if((st="UP" AND(prevst="DOWN")) OR (st="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

Sukisen1981 · ‎09-16-2017

| eval host_message=m1+" " +m2
| eval t=strptime(strftime(_time,"%m/%d/%Y %H:%M:%S"),"%m/%d/%Y %H:%M:%S" )
| reverse
| rex field=host_alert ^(?.*?)";"
| streamstats current=false last(status) as prevst,last(t) as prevt
| eval downtime=if((status="UP" AND(prevst="DOWN")) OR (status="DOWN" AND(prevst="DOWN")),round((t-prevt)/60,2),0)
| eval uptime=if(downtime=0,round((t-prevt)/60,2),0)
| fieldformat _time=strftime(_time,"%m/%d/%Y %H:%M")
| table _time,uptime,downtime

Sukisen1981 · ‎09-16-2017

hmm some issue with the pasting - line 4 in the rex , after the first? mark and before the . write

Server availability reports

The OpenTelemetry Certified Associate (OTCA) Exam

From Manual to Agentic: Level Up Your SOC at Cisco Live

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

Join the Conversation

Server availability reports

The OpenTelemetry Certified Associate (OTCA) Exam

From Manual to Agentic: Level Up Your SOC at Cisco Live

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)