Splunk Search

Looking for better alternative to checking a windows service status and trigger on state change

Builder

Using the windows Infrastructure TA I have the following snippet in my inputs.conf:

[WinHostMon://service]
type = service
interval = 300
index = windows

and I have a search which triggers an alert when the service stops :

index=windows host="hostname" sourcetype="WinHostMon" Name=mysql status=stopped earliest=-5m latest=now

This works and triggers an alert - but how can this be improved so it doesn't trigger every 5 minutes once the status has stopped? and ideally, reset once the status has changed back to running?

Thanks.

Community Manager
Community Manager

Hi @Esky73 y73 ,

Did you have a chance to check out some answers? If it worked, please resolve this post by approving it! If your problem is still not solved, keep us updated so that someone else can help you.

Thanks for posting!

0 Karma

New Member

I know this question is old, but I haven't seen anyone answer the question the OP had efficiently. Lookuptables are great, I use them for baselining algorithms and the like - but their usefulness is wasted on a State change for services.

index=* source=* Name="NameOfService" earliest=-10m latest=now

| bin _time span=5m
| stats latest(State) as State earliest(State) as PreviousState by host Name
| table host Name State PreviousState | where 'State'!='PreviousState'

Set the alert to search every 5 minutes and it will compare the current state to the previous state and only alert you when it changes. The only drawback is that if your service is starting/stopping in tight intervals within that 5m bin, then it may not record a change, but it's highly unlikely.

0 Karma

Esteemed Legend

OK, you do it like this: Create a lookup called WindowsState.
You need 2 searches:
1 search does this:

Your Search Here | dedup host State | rename _time AS Time
| appendpipe [|inputcsv WindowsStatus | eval State="Stopped"]
| stats min(Time) AS Time values(State) AS State BY host
| search NOT State="Running"
| table Time host | outputcsv WindowsStatus

The other search fires your alert like this:

| inputcsv WindowsStatus
0 Karma

Builder

thanks for assistance - so heres my search - nothing is being populated in the csv file although it is created

the search on its own gives me the following event:
index=windows host="HOST" sourcetype="WinHostMon" Name=MYSQL57 State="stopped"

Type=Service
Name="MySQL57"
DisplayName=
Description=
Path=""C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe" --defaults-file="C:\ProgramData\MySQL\MySQL Server 5.7\my.ini" MySQL57"
ServiceType="Own Process"
StartMode="Auto"
Started=false
State="Stopped"
Status="OK"
ProcessId=0

With your search (changed status to State)

index=windows host="HOST" sourcetype="WinHostMon" Name=MYSQL57 State="stopped"
| dedup host state | rename _time AS Time
| appendpipe [|inputcsv WindowsStatus | eval state="stopped"]
| stats min(Time) AS Time values(status) AS state BY host
| search NOT state="running"
| table Time host | outputcsv WindowsStatus

0 Karma

Esteemed Legend

I used your original field names and values but now that I see the event spelling is different, I updated my answer. Try it now but DO NOT USE the State="stopped" part; the search needs both Running and Stopped states to be captured or it will not auto-clear.

0 Karma

Esteemed Legend

There is a Throttle setting (it may be in the Advanced area of the Alert). Enable this and set your ignore threshold.

0 Karma

Builder

but that just hides the alert for a period of time ?

0 Karma

Esteemed Legend

It causes it not to re-alert even though the same condition exists (even thought later searches would otherwise normally alert).

0 Karma

Builder

Hi, i still can't see how that helps for example if i set a throttle for 30 mins then i effectively have a 30 min gap of not knowing what state my service is in ?

I'm looking for something like this .. maybe 2 searches joined together ?

mins State Alert
5 running No
10 stopped yes
15 stopped no
20 stopped no
25 stopped no
30 stopped no
35 running no (or maybe a changed state - clear Alarm)
40 stopped yes

0 Karma

SplunkTrust
SplunkTrust

Get only the most recent record by using head 1 (you can also try dedup). Then alert if the status=stopped i.e. result count >0 for the following query

index=windows host="hostname" sourcetype="WinHostMon" Name=mysql status=* earliest=-5m latest=now
| head 1
| search status=stopped
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Builder

Thanks for your input - can you explain further why this only runs once ? When the search runs again in 5 mins .. why won't the alert be triggered again ?
Actually - yes i tried it and it has alerted again - the alert sends a webhook to another app - so currently it has sent a duplicate alert which has created a separate incident - which is what was occurring before

0 Karma

SplunkTrust
SplunkTrust

@Esky73, head 1 will only give you latest Event with mySQL status. Only if status="stopped", then there will be a result in the search. Since you have scheduled this for every 5 minute it will test the same again in 5 minutes.

Seems like you want to snooze alert for specific time once it has already triggered. There is an option called Throttle in Splunk Alert while setting up Trigger Conditions. This prevents subsequent alerts to be triggered while IT team loks at the issue and resolves. You can set the suppression to 15 minute(s), or 1 hour(s) or 1 day(s) etc as per your need.

On a separate note you can test directly in search, or test alert in test system or else keep alert action local to be displayed in Alert list.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Builder

played around with throttle and that doesn't really give me what i want.

If i say throttle for 30 mins then it will send another alert after 30 mins .. in in that 30 mins the service could have been restarted - and fell over again - and we won't know about it for another 30 mins.

cheers.

I have been trawling answers for sometime looking for something similar - and the closest i have found is something like look for a status and write it to a lookup file - then create another search and check that lookup file and alert if status is X but i'm not 100% on how to go about that either at the mo.

0 Karma

SplunkTrust
SplunkTrust

You dont need to report every 5 min when the status is stopped and you don't want to throttle. So may be I am confused. Let me mark my answers as comment so that it flashes to others as unanswered and they provide you with the answer you are looking for.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

SplunkTrust
SplunkTrust

@Esky73, Is your requirement to alert when status was up and then went down. But not to trigger again unless status goes up and then goes down again?

In other words even if the status remains down for a day without someone restarting the service, you do not want second alert? Or is there a max limit to how long can the alert be continuously down (like 1 hou/1day/1week etc)?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

SplunkTrust
SplunkTrust

Hi there, adding to the above, you can use ... | stats latest(status) ... and then alert if status stopped. I think you are looking for the throttling feature. When creating an alert, setup throttleing according to your needs. more about it, here: https://docs.splunk.com/Documentation/Splunk/6.5.2/Alert/ThrottleAlerts

0 Karma