Hi Experts,
We have a search which checks for critical windows event logs on a windows box which decide the health of a server based on the occurrence of the event.
scenario :
When an event is fired in event log, we mark the server RED by comparing with a list of servers we have, if there is a any event for a server, those will be marked RED and rest will be marked GREEN. It will be done by a saved search we have which runs every 15 minutes. in the first occurrence of the search it finds the event and turns the store red, But next time the search runs and checks for past 15 minutes data and doesnt find the same event for that server and will change it to GREEN but the previous event is still persisting and the server should still be RED, but our search will mark it as GREEN.
My question is how do we take the last state/health of that server and mark it RED if there is no other event which resolves the existing event. Below is my search :
index="tools_netcool" sourcetype="netcool_alerts" ALERTKEY="Failed to Connect to Computer" TYPE=1 NODE="ISP*" NOT NODE=ISP9*
| rename LOCATION as loc NODE as host
| stats latest(TYPE) as TYPE,latest(_time) as _time by loc host
| rex field=host "ISP(?<loc>\d+)(?<hostType>\w)$"
| eval health=if(hostType="F","YELLOW","RED")
| append
[| inputlookup host_list.csv
| search NOT host=ISP9*
| rex field=host "ISP(?<loc>\d+)(?<hostType>\w)$"
| table loc host hostType ]
| eventstats count as occurence_count by host
| fillnull value=0 TYPE
| where NOT (occurence_count=2 AND TYPE=0)
| fillnull value="GREEN" health
| eventstats values(eval(case(hostType="A",health))) as A_Health by store
| eval A_Health=if(hostType="B",A_Health,"NA")
| eval health=if(hostType="B" AND health="RED" AND A_Health="RED","RED",
if(hostType="B" AND health="RED" AND A_Health="GREEN","YELLOW",health))
| eval _time=now()
| eval Metric="Servers Availability"
| eval kpi_type=Metric
| eval kpi_key1=""
| eval kpi_value1=""
| eval kpi_key2=""
| eval kpi_value2=""
| eval ecosystem="Servers"
| eval name=host
| table _time store name health ecosystem kpi_key1 kpi_value1 kpi_key2 kpi_value2 kpi_type
I would recommend utilizing the kvstore to maintain state if you're going to want to know the current state of all your machines. Every x minutes check for any new events, and overwrite the existing value for a host with the color/status. inputlookup
this kv store, find all new statuses, dedup
to get the latest values, then outputlookup append=t
to save any changes. Then when you're trying to view status, you only have to input the kvstore to your display.