Here the logs I have
04/24/2017 02:42:08 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30715
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the stopped state.
04/25/2017 06:37:31 AM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=31064
Keywords=Classic
Message=The Windows Defender Service service entered the stopped state.
04/23/2017 01:03:08 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30644
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the stopped state.
04/24/2017 02:42:07 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30714
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the running state.
My search
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036 | transaction host maxevents=2 | eval DurationinMinutes=duration/60 | where DurationinMinutes>500 | table host, Message , DurationinMinutes | sort - DurationinMinutes
but this returns the following data -
host Message DuratininMInutes
Mycomputer The Windows Defender Network Inspection Service service entered the running state.
The Windows Defender Network Inspection Service service entered the stopped state. 1538.983333
Mycomputer The Windows Defender Network Inspection Service service entered the stopped state.
The Windows Defender Service service entered the stopped state. 1526.616667
Mycomputer The Windows Defender Network Inspection Service service entered the stopped state.
The Windows Defender Service service entered the stopped state. 1404.516667
The first 2 events are good but I don't want the last event. how do I filter them out ?
Since you are looking for more than one weeks data, transaction command may actually drop the events. You can try adding keepevicted=true
in your transaction query, but this will slow down event further. You should try to switch to stats instead to take advantage of map-reduce and faster search:
1) If you want to alert for stopped status per host where time is greater than a week you can just do a dedup for stopped
state and calculate duration as now()-_time
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval downTime=(now()-_time)
| table _time host Name State downTime
| search State="stopped" AND downTime>604800
You can also setup the final | search downTime>604800
condition in your alert directly so that you can see downtTime for various host through alert query and trigger only if downTime is greater than a week. PS: 1 week =60*60*24*7= 604800 sec. Alternatively you can perform eval to convert to days as well (same way you have done in your example)
2) If you want to show duration from last running or stopped per host for dashboard (not alert), use the following:
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval lastStatusDuration=(now()-_time)
| table _time host Name State lastStatusDuration
3) If you want to calculate various duration between stopped and running and take more control based on conditions, instead of transaction you should use the following stats command
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| eval groupKey= host."-".EventCode."-".Name."-".State
| dedup groupKey
| stats min(_time) as MinTime max(_time) as MaxTime last(State) as FinalState values(State) as State by host
| eval _time=MaxTime
State field is multi-valued which tells whether both running and stopped states are present or not. You can take control through commands like | search State="running" AND State="stopped"
and | search State!="running"
etc. For calculating the duration you can use MinTime and MaxTime based on what is the LastState field value. You can also use now()-_time() like previous examples.
I have a similar question where my search has to find the keyword "Service.com" and then the keyword "connection reset" and only if both are present within a time duration of 1 min i shoud get the result.
Can you help me with this..
Since you are looking for more than one weeks data, transaction command may actually drop the events. You can try adding keepevicted=true
in your transaction query, but this will slow down event further. You should try to switch to stats instead to take advantage of map-reduce and faster search:
1) If you want to alert for stopped status per host where time is greater than a week you can just do a dedup for stopped
state and calculate duration as now()-_time
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval downTime=(now()-_time)
| table _time host Name State downTime
| search State="stopped" AND downTime>604800
You can also setup the final | search downTime>604800
condition in your alert directly so that you can see downtTime for various host through alert query and trigger only if downTime is greater than a week. PS: 1 week =60*60*24*7= 604800 sec. Alternatively you can perform eval to convert to days as well (same way you have done in your example)
2) If you want to show duration from last running or stopped per host for dashboard (not alert), use the following:
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval lastStatusDuration=(now()-_time)
| table _time host Name State lastStatusDuration
3) If you want to calculate various duration between stopped and running and take more control based on conditions, instead of transaction you should use the following stats command
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| eval groupKey= host."-".EventCode."-".Name."-".State
| dedup groupKey
| stats min(_time) as MinTime max(_time) as MaxTime last(State) as FinalState values(State) as State by host
| eval _time=MaxTime
State field is multi-valued which tells whether both running and stopped states are present or not. You can take control through commands like | search State="running" AND State="stopped"
and | search State!="running"
etc. For calculating the duration you can use MinTime and MaxTime based on what is the LastState field value. You can also use now()-_time() like previous examples.
@niketn this seems very similar to how I'm trying to calculate uptime/downtime percentage by host last 7 days and last 30 days on my question here :
https://community.splunk.com/t5/Dashboards-Visualizations/Help-showing-the-Uptime-downtime-percentag...
You seem like you have a lot of experience on this topic, appreciate your help in advance!
Thanks a lot, for some reason rex was not working for so I did field extractions
index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036 Name=* State=* tag=alert earliest=-1w@w | dedup host State | eval downTime=(now()-_time) | table _time host Name State downTime | search State="stopped state" AND downTime>561600
To set-up an alert for last 7 days I had to add earliest=-1w@w
and downtime is more like 6.5 days.
What's the problem with last event? Your filtering condition lies in that. Also, you say you want to alert when the duration is more than a week, but your where condition is checking for just 500 mins and not 24*7*60 minutes.
I am not able to paste the table here. Basically the last event groups 2 service stopped events. I want to see duration betwewn service started and service stopped but my query is returning single events such as service stopped along with the duration.
Your tansaction command needs more conditions to correctly make pairs. Look at splunk documentation for transaction coomand for parameters startswith and endswith in which you need to include expressions to match service started and stopped events. Look at the examples.
Just clarify the alert should be triggered only when Windows defender service was stopped for more than a week or around 10080 minutes . Since events span across 7 days my query is giving incorrect result. Thanks in advance for any or all the help !