host Status=Offline OR Status=Online | search target="" | selfjoin Status | sort _time,target | table _time,target,Status,src,host | dedup 1 Status,target | rename target as Agent_Host | rename Status as Current_Status | rename src as Source_IP
the machines go offline then come back on, need to monitor for the duration of the downtime as well as maybe alert when the duration exceeds an hour
thanks in advance
Hi
when you monitor a downtime you have to monitor all servers: up servers and down servers.
To be sure that all the monitored servers are checked you have to create a lookup (e.g. perimeter.csv) with all the servers to monitor and try something like this
index=your_index Status=Offline OR Status=Online
| transaction host startswith="Offline" endswith="Online"
| eval host=upper(host), count=1
| append [ search index=_internal NOT [ search index=your_index Status=Offline OR Status=Online | eval host=upper(host), count=10 | dedup host | fields host count ]
| append [ | inputlookup perimeter.csv | eval host=upper(host), count=0 | fields host count ]
| stats sum(count) AS Total values(duration) AS duration BY host
| eval Status=case(count=0,"Server Down",count=1,"Downtime="+tostring(duration,"duration",count>2,"Server Up")
| table host Status
Bye.
Giuseppe
You can use transaction
command if there are multiple events
your search | transaction Host, "other common unique fields for these two transaction" startswith="Offline" endswith="Online"|timechart duration
or use stats
your search|stats first(_time) as End,last(_time) as Start by Status|eval Difference=End-Start|chart Difference