Monitoring Splunk

Persisting the health of a server using the events fired from windows event logs or some other monitoring tool

macadminrohit
Contributor

Hi Experts,

We have a search which checks for critical windows event logs on a windows box which decide the health of a server based on the occurrence of the event.

scenario :

When an event is fired in event log, we mark the server RED by comparing with a list of servers we have, if there is a any event for a server, those will be marked RED and rest will be marked GREEN. It will be done by a saved search we have which runs every 15 minutes. in the first occurrence of the search it finds the event and turns the store red, But next time the search runs and checks for past 15 minutes data and doesnt find the same event for that server and will change it to GREEN but the previous event is still persisting and the server should still be RED, but our search will mark it as GREEN.

My question is how do we take the last state/health of that server and mark it RED if there is no other event which resolves the existing event. Below is my search :

index="tools_netcool" sourcetype="netcool_alerts" ALERTKEY="Failed to Connect to Computer" TYPE=1 NODE="ISP*" NOT NODE=ISP9* 
| rename LOCATION as loc NODE as host 
| stats latest(TYPE) as TYPE,latest(_time) as _time by loc host 
| rex field=host "ISP(?<loc>\d+)(?<hostType>\w)$" 
| eval health=if(hostType="F","YELLOW","RED")
| append 
    [| inputlookup host_list.csv 
    | search NOT host=ISP9* 
    | rex field=host "ISP(?<loc>\d+)(?<hostType>\w)$" 
    | table loc host hostType ] 
| eventstats count as occurence_count by host 
| fillnull value=0 TYPE 
| where NOT (occurence_count=2 AND TYPE=0) 
| fillnull value="GREEN" health 
    | eventstats values(eval(case(hostType="A",health))) as A_Health by store
    | eval A_Health=if(hostType="B",A_Health,"NA")
    | eval health=if(hostType="B" AND health="RED" AND A_Health="RED","RED",
                       if(hostType="B" AND health="RED" AND A_Health="GREEN","YELLOW",health))
| eval _time=now() 
| eval Metric="Servers Availability"
| eval kpi_type=Metric 
| eval kpi_key1="" 
| eval kpi_value1="" 
| eval kpi_key2="" 
| eval kpi_value2="" 
| eval ecosystem="Servers" 
| eval name=host 
| table _time store name health ecosystem kpi_key1 kpi_value1 kpi_key2 kpi_value2 kpi_type
Tags (1)
0 Karma

hortonew
Builder

I would recommend utilizing the kvstore to maintain state if you're going to want to know the current state of all your machines. Every x minutes check for any new events, and overwrite the existing value for a host with the color/status. inputlookup this kv store, find all new statuses, dedup to get the latest values, then outputlookup append=t to save any changes. Then when you're trying to view status, you only have to input the kvstore to your display.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...