Hi All,
I am hoping you can help me out with the following :
I am preparing a report from the logs of our monitoring tool to show all service checks which are in an unknown state and did not go into an OK state during a certain time frame. Following is my search
index=main sourcetype=omd service_alert=*
| search service_check="sep" | search host_name=filesrv01*
| convert timeformat="%F %H:%M:%S" ctime(_time) as last_discovered
| table _time last_discovered host_name service_check service_alert service_state service_message
| sort 0 host_name, _time
| streamstats reset_on_change=true last(_time) as lastseen, first(_time) as firstseen last(service_alert) as last_alert by host_name, service_check, service_alert, service_state, service_message
| convert timeformat="%F %H:%M:%S" ctime(lastseen) as lastseen_human
| convert timeformat="%F %H:%M:%S" ctime(firstseen) as firstseen_human
| sort 0 host_name, -_time, firstseen
| dedup consecutive=t host_name, service_check, service_alert, service_state, service_message
This gives me the following output :
_time last_discovered host_name service_check service_alert service_state firstseen firstseen_human last_alert lastseen lastseen_human
2017-10-17T20:00:03.000-0400 10/17/2017 20:00 srv1 sep service OK HARD 1508278263 10/17/2017 18:11 OK 1508284803 10/17/2017 20:00
2017-10-17T18:11:03.000-0400 10/17/2017 18:11 srv1 sep service OK HARD 1508278263 10/17/2017 18:11 OK 1508278263 10/17/2017 18:11
2017-10-17T18:11:03.000-0400 10/17/2017 18:11 srv1 sep service OK HARD 1508278263 10/17/2017 18:11 OK 1508278263 10/17/2017 18:11
2017-10-17T18:06:37.000-0400 10/17/2017 18:06 srv1 Windows Time UNKNOWN HARD 1508277997 10/17/2017 18:06 UNKNOWN 1508277997 10/17/2017 18:06
2017-10-17T18:06:37.000-0400 10/17/2017 18:06 srv1 sep service UNKNOWN HARD 1508277997 10/17/2017 18:06 UNKNOWN 1508277997 10/17/2017 18:06
Now, the report should only show one service that did not change its state from UNKNOWN to any other value. In other words SEP service should not show up in the report since its last_state got changed from UNKNOWN to OK. How do i achieve that ?
Any help appreciated. Thank you.
Adding an extra dedup
command will filter out all but the most recent service_state. Then use where
to discard those that are OK. The result contains the services that did not change to OK.
... | | convert timeformat="%F %H:%M:%S" ctime(firstseen) as firstseen_human
| dedup service_state
| where service_state !="OK"
| sort 0 host_name, -_time, firstseen
| dedup consecutive=t host_name, service_check, service_alert, service_state, service_message
You can mark your code and data with the code button (101 010) or by putting at least four spaces on the beginning of each line.
Adding an extra dedup
command will filter out all but the most recent service_state. Then use where
to discard those that are OK. The result contains the services that did not change to OK.
... | | convert timeformat="%F %H:%M:%S" ctime(firstseen) as firstseen_human
| dedup service_state
| where service_state !="OK"
| sort 0 host_name, -_time, firstseen
| dedup consecutive=t host_name, service_check, service_alert, service_state, service_message
Thanks Rich ! This works for me