We have some critical services we are monitoring on a realtime system so responding in a timely manner is essential. If the services stop we need to be notified. Currently we monitor this with WinHostMon and when one of the services stops we sent out an email indicating it has stopped.
Sometimes but not always an event will get logged indicating one of these same services has stopped by a particular user from a certain machine (usually from program interface). These events come in via the application event log. We don’t alert on this currently.
Want:
We want to be able to incorporate this information into a single alert to do the following:
1. If a user stops the service, send an email that includes the user and source machine
2. If the service stops in another manner (for example through the control panel) where there is no specific user - still send the alert, but just put the user and source machine info as unknown.
What I have tried:
Did a left join (left side is monitoring the WinHostMon) and if user and source machine are included go ahead and send it or default to Unknown (using fillnull)
Problem:
I get the alert but sometimes it doesn’t include a username and source user and sometimes it does. I think this is a timing issue as I sometimes see two events show up, one with the default Unknown user and a second with the right info (extractions for these fields have already been set up and working). Currently the alert to set up to run under a cron job 1/minute and the WinHostMon watches the services every 30 seconds.
The following is a example of what I have tried that sometimes works and sometimes does not:
Type=Service service_name="Critical Service to Watch" State="Stopped" | dedup service_name,host | join type=left service_name,host [ search index=wineventlog Message="Shutdown of the * service on computer * requested by user *" service_name="Critical Service to Watch" | fields host,message,service_name,source_machine,user] |fillnull value=Unknown source_machine,message,user | fields host,Name,service_name,source_machine,message,user
Splunk 6.2.3.
I am not sure what is best way to handle this, perhaps some kind of lookup that when it sees someone stop the service append it to the log and when the service starts it removes this entry from the lookup file?
Anyone have any other ideas?
Perhaps you are hitting some limits; have you tried without the join
? Does this work any better?
(Type=Service service_name="Critical Service to Watch" State="Stopped") OR (index=wineventlog Message="Shutdown of the service on computer requested by user *" service_name="Critical Service to Watch") | stast values(*) AS * BY service_name host | fillnull value=Unknown source_machine,message,user | fields host,Name,service_name,source_machine,message,user
Perhaps you are hitting some limits; have you tried without the join
? Does this work any better?
(Type=Service service_name="Critical Service to Watch" State="Stopped") OR (index=wineventlog Message="Shutdown of the service on computer requested by user *" service_name="Critical Service to Watch") | stast values(*) AS * BY service_name host | fillnull value=Unknown source_machine,message,user | fields host,Name,service_name,source_machine,message,user
Thanks woodcock - it looks like this might be the solution. I am going to be doing some testing but from the initial testing I have done so far it seems to have solved my problem. Once I am sure, I'll mark it answered.
I ended up having to tweak this a little, but this pointed me in the right direction. Thanks