I have a search that returns server events and would like to know when this event is NOT followed by a recovery message within a short period of time.
In my example below, the event is triggered and it recovers 5 seconds later. In this case, I would NOT want this to return results.
Oct 21 06:40:13 cam-vm-mon3 mfsmount[3425]: master: connection lost (1)
Oct 21 06:40:18 cam-vm-mon3 mfsmount[3425]: registered to master
To give my question better context, I am actually running this in an old Splunk 4.3 install and am configuring this as an alert, so these are alert criteria more than search criteria
I have been playing with append and appendpipes but am not having any luck. Can anyone offer any suggestions?
Like this (with a base search that limits to only 2 types of events):
... | eval type=case(searchmatch("master: connection lost"), "down", searchmatch("registered to master"), "up", true(), "BUG!") | reverse | streamstats count(eval(type="down")) AS sessionID by host | eventstats latest(_time) AS latestTime latest(type) AS latestType by sessionID host | where type="down" | eval downSeconds=if(latestType="down", now(), latestTime) - _time | where downSeconds > 5
Like this (with a base search that limits to only 2 types of events):
... | eval type=case(searchmatch("master: connection lost"), "down", searchmatch("registered to master"), "up", true(), "BUG!") | reverse | streamstats count(eval(type="down")) AS sessionID by host | eventstats latest(_time) AS latestTime latest(type) AS latestType by sessionID host | where type="down" | eval downSeconds=if(latestType="down", now(), latestTime) - _time | where downSeconds > 5
Additional thoughts on paths forward.
OR