Alerting

With a set of events, continuously collect some based on content and alert when there's been a 5 minute gap, alert on others immediately

cdhippen
Path Finder

We have software restarts that can occur either when they're forced which will produce this:

2019-08-18 23:15:21 restartBy= restartUser=.......... restartReason=................

Followed shortly after by something like:

2019-08-18 23:19:46,222 INFO (i/o) [....] Version Information: 1.1.11

The build version information will always show up after a restart, but the first one only shows up if it was restarted manually. We want to alert on the manual restarts immediately, but for system restarts, we want to collect them until there's been 10 minutes since the last system restart. i.e. We want to alert that there were x system restarts, the first restart time for this group of alerted restarts was at x time, the last one was x time.

The problem so far is that if the restarts are occurring for longer than the search window they won't show up in the collected alert and also if there were two different groups in the time range I would end up counting all of the restarts from both groups. This is what I've got so far but it's throwing me for a loop and I'm having trouble finishing it out.

| inputlookup partial_day_core_restarts.csv
| search alerted="false"
| eval new="false" 
| append 
    [| search 
        <base search>
    | eval build=coalesce(build, signature) //  for the build version, sometimes the field is reported as signature
    | rex "restartReason\=(?<restartReason>.*)" 
    | rex "service\/(?<core>.*)" 
    | rex "sudo: (?<restartUser>.*) :" 
    | lookup workspace workspaceGuid output currentCustomerGuid as customerGuid 
    | lookup customer-dc5prod customerGuid output name as customerName 
    | eventstats values(eval(if(isnotnull(restartReason), restartReason, null()))) as restartReason values(eval(if(isnotnull(restartUser), restartUser, null()))) as restartUser by core 
    | eval restart=core + ":::::" + restartReason + ":::::" + restartUser 
    | eval restart=coalesce(restart, core + ":::::System Restart:::::System Restart") 
    | mvexpand restart 
    | eval build1=build 
    | eventstats latest(eval(if(isnotnull(build), _time, null()))) as restartTime by restart 
    | eventstats values(eval(if(_time>restartTime-2000, workspaceGuid, null()))) as workspaces values(eval(if(_time>restartTime-2000, customerName, null()))) as customers latest(build) as build by restart 
    | eval customers=mvjoin(customers, "::"), workspaces=mvjoin(workspaces, "::")
    | fillnull customers workspaces value="None Active" 
    | table workspaces customers restartTime build build1 core restart 
    | where isnotnull(build1) 
    | eval restartReason=mvindex(split(restart, ":::::"), 1), restartUser=mvindex(split(restart, ":::::"), 2) 
    | eval new="true"]
| stats values(new) as new by build core customers restartReason restartTime restartUser workspaces
| eventstats max(restartTime) as restartTime1 by restartReason
| eval alert=case(restartTime1>now()-300 AND restartReason="System Restart", "false", mvcount(new)>1, "false", match(new, "false"), "false", match(new, "true"), "true")
| eval new="false"
| outputlookup partial_day_core_restarts.csv
| eval alert=case(restartTime1>now()-300 AND restartReason="System Restart", "false", mvcount(new)>1, "false", match(new, "false"), "false", match(new, "true"), "true")
| search alert="true"
| stats values(core) as core values(customers) as customers values(workspaces) as workspaces by restartTime1 restartUser restartReason build
| eval core=if(mvcount(core)>5, tostring(mvcount(core)) + " cores were restarted", core), customers=replace(customers, "::", ", ")
| convert ctime(restartTime1) as restartTime timeformat="%Y-%m-%d %H:%M:%S"
| eval customers=replace(replace(mvjoin(customers, ", "), "None Active, ", ""), ", None Active", ""), workspaces=replace(replace(mvjoin(workspaces, ", "), "None Active, ", ""), ", None Active", ""), core=mvjoin(core, ", ")
| fillnull customers workspaces value="None Active"
| eval throttle=md5(restartTime.restartUser.restartReason.build)

I'm totally open to new ways of doing this that would be simpler as well

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...