Splunk Search

How to get duration between a start and stop event and trigger an alert if duration is greater than 1 week?

dmenon84
Path Finder

Here the logs I have

04/24/2017 02:42:08 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30715
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the stopped state.


04/25/2017 06:37:31 AM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=31064
Keywords=Classic
Message=The Windows Defender Service service entered the stopped state.



04/23/2017 01:03:08 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30644
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the stopped state.


04/24/2017 02:42:07 PM
LogName=System
SourceName=Microsoft-Windows-Service Control Manager
EventCode=7036
EventType=4
Type=Information
ComputerName=Mycomputer 
TaskCategory=The operation completed successfully.
OpCode=The operation completed successfully.
RecordNumber=30714
Keywords=Classic
Message=The Windows Defender Network Inspection Service service entered the running state.

My search

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036  |  transaction host maxevents=2 | eval DurationinMinutes=duration/60 | where DurationinMinutes>500 | table  host, Message , DurationinMinutes | sort  - DurationinMinutes

but this returns the following data -

host Message DuratininMInutes
Mycomputer The Windows Defender Network Inspection Service service entered the running state.
The Windows Defender Network Inspection Service service entered the stopped state. 1538.983333
Mycomputer The Windows Defender Network Inspection Service service entered the stopped state.
The Windows Defender Service service entered the stopped state. 1526.616667
Mycomputer The Windows Defender Network Inspection Service service entered the stopped state.
The Windows Defender Service service entered the stopped state. 1404.516667

The first 2 events are good but I don't want the last event. how do I filter them out ?

0 Karma
1 Solution

niketn
Legend

Since you are looking for more than one weeks data, transaction command may actually drop the events. You can try adding keepevicted=true in your transaction query, but this will slow down event further. You should try to switch to stats instead to take advantage of map-reduce and faster search:

1) If you want to alert for stopped status per host where time is greater than a week you can just do a dedup for stopped state and calculate duration as now()-_time

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval downTime=(now()-_time)
| table _time host Name State downTime
| search State="stopped" AND downTime>604800

You can also setup the final | search downTime>604800 condition in your alert directly so that you can see downtTime for various host through alert query and trigger only if downTime is greater than a week. PS: 1 week =60*60*24*7= 604800 sec. Alternatively you can perform eval to convert to days as well (same way you have done in your example)

2) If you want to show duration from last running or stopped per host for dashboard (not alert), use the following:

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval lastStatusDuration=(now()-_time)
| table _time host Name State lastStatusDuration

3) If you want to calculate various duration between stopped and running and take more control based on conditions, instead of transaction you should use the following stats command

| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| eval groupKey= host."-".EventCode."-".Name."-".State
| dedup groupKey
| stats min(_time) as MinTime max(_time) as MaxTime last(State) as FinalState values(State) as State by host
| eval _time=MaxTime

State field is multi-valued which tells whether both running and stopped states are present or not. You can take control through commands like | search State="running" AND State="stopped" and | search State!="running" etc. For calculating the duration you can use MinTime and MaxTime based on what is the LastState field value. You can also use now()-_time() like previous examples.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

0 Karma

Deepz2612
Explorer

I have a similar question where my search has to find the keyword "Service.com" and then the keyword "connection reset" and only if both are present within a time duration of 1 min i shoud get the result.
Can you help me with this..

0 Karma

niketn
Legend

Since you are looking for more than one weeks data, transaction command may actually drop the events. You can try adding keepevicted=true in your transaction query, but this will slow down event further. You should try to switch to stats instead to take advantage of map-reduce and faster search:

1) If you want to alert for stopped status per host where time is greater than a week you can just do a dedup for stopped state and calculate duration as now()-_time

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval downTime=(now()-_time)
| table _time host Name State downTime
| search State="stopped" AND downTime>604800

You can also setup the final | search downTime>604800 condition in your alert directly so that you can see downtTime for various host through alert query and trigger only if downTime is greater than a week. PS: 1 week =60*60*24*7= 604800 sec. Alternatively you can perform eval to convert to days as well (same way you have done in your example)

2) If you want to show duration from last running or stopped per host for dashboard (not alert), use the following:

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036
| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| dedup host State
| eval lastStatusDuration=(now()-_time)
| table _time host Name State lastStatusDuration

3) If you want to calculate various duration between stopped and running and take more control based on conditions, instead of transaction you should use the following stats command

| rex field=message "The (?<Name>[a-zA-Z|\s]+) Service service entered the (?<State>[a-zA-Z]+) state."
| eval groupKey= host."-".EventCode."-".Name."-".State
| dedup groupKey
| stats min(_time) as MinTime max(_time) as MaxTime last(State) as FinalState values(State) as State by host
| eval _time=MaxTime

State field is multi-valued which tells whether both running and stopped states are present or not. You can take control through commands like | search State="running" AND State="stopped" and | search State!="running" etc. For calculating the duration you can use MinTime and MaxTime based on what is the LastState field value. You can also use now()-_time() like previous examples.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

johnward4
Communicator

@niketn this seems very similar to how I'm trying to calculate uptime/downtime percentage by host last 7 days and last 30 days on my question here : 

https://community.splunk.com/t5/Dashboards-Visualizations/Help-showing-the-Uptime-downtime-percentag...

You seem like you have a lot of experience on this topic, appreciate your help in advance!

0 Karma

dmenon84
Path Finder

Thanks a lot, for some reason rex was not working for so I did field extractions

index=wineventlog eventtype=winsystem *The Windows Defender service entered* EventCode=7036 Name=* State=*   tag=alert earliest=-1w@w | dedup host State   | eval downTime=(now()-_time)    | table  _time host Name State downTime  | search State="stopped state" AND downTime>561600

To set-up an alert for last 7 days I had to add earliest=-1w@w and downtime is more like 6.5 days.

0 Karma

somesoni2
Revered Legend

What's the problem with last event? Your filtering condition lies in that. Also, you say you want to alert when the duration is more than a week, but your where condition is checking for just 500 mins and not 24*7*60 minutes.

0 Karma

dmenon84
Path Finder

I am not able to paste the table here. Basically the last event groups 2 service stopped events. I want to see duration betwewn service started and service stopped but my query is returning single events such as service stopped along with the duration.

0 Karma

somesoni2
Revered Legend

Your tansaction command needs more conditions to correctly make pairs. Look at splunk documentation for transaction coomand for parameters startswith and endswith in which you need to include expressions to match service started and stopped events. Look at the examples.

0 Karma

dmenon84
Path Finder

Just clarify the alert should be triggered only when Windows defender service was stopped for more than a week or around 10080 minutes . Since events span across 7 days my query is giving incorrect result. Thanks in advance for any or all the help !

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...