Splunk Search
Highlighted

Efficient and "correct" way to counting stats based on a *sequence* of events within a rolling timeframe

Path Finder

Creating stats count based on a sequence of events within a timeframe. For example, count the unique sessions, within a 6-hour timeframe, that resulted in 1- Failures without Success, 2- Success, or 3- Failures followed by Success:

SessionID   Time      Action
Abcd      12:03:11  Failure
Abcd      12:04:19  Failure
m        12:05:49   Failure
XXXXX    12:06:20   Failure
XXXXX   12:07:34    Failure
Abcd      12:10:11  Failure
Abcd      12:23:12  Success
ZZ      12:28:10    Success
XXXXX    12:31:00   Failure
Abcd      21:03:11  Success
m         22:03:11  Failure
m         22:03:12  Success

Produces:

Failure_no_success | Success | Failure_then_success
    2              |   2     |       2

Where Failurenosuccess is the three XXXXX and the first m sessions, Success is the ZZ session and the last Abcd session, and Failurethensuccess is the four Abcd and the last two m sessions.

There are multiple inefficient ways to solve this, like combining many subsearches, outputing some of the data to a lookup table and reading it back, etc. But is there a "correct" and scalable way to perform this count?

Highlighted

Re: Efficient and "correct" way to counting stats based on a *sequence* of events within a rolling timeframe

Explorer

A starting point is a distinct count with a timechart dc(sessionID) span=6h. I dont know how to do the multiple sequence events

0 Karma
Highlighted

Re: Efficient and "correct" way to counting stats based on a *sequence* of events within a rolling timeframe

Motivator

This is a bit tricky, but you could try going forward like this:

<base search>
| streamstats count(eval(Action=="Failure")) as failed count(eval(Action=="Success")) as succeeded last(Action) as last_action by SessionID
| eval status=case(failed>0 AND succeeded=0, "Failure_no_successs", failed=0 AND succeeded>0, "Success", failed>0 AND succeeded>0 AND last_action="Success", "Failure_then_success",true(),"other")
| stats count by status
0 Karma
Highlighted

Re: Efficient and "correct" way to counting stats based on a *sequence* of events within a rolling timeframe

Path Finder

Almost but not there.

 | eval status=case(failed>0 AND succeeded=0, "Failure_no_successs", failed=0 AND succeeded>0, "Success", failed>0 AND succeeded>0 AND last_action="Success", "Failure_then_success",true(),"other")

Doesn't evaluate the Failurethensuccess and marks it as Success. I'll dig deeper using streamstats There's also no grouping by time (6 hours)

0 Karma
Highlighted

Re: Efficient and "correct" way to counting stats based on a *sequence* of events within a rolling timeframe

Esteemed Legend

Like this:

|makeresults | eval raw="SessionID=Abcd,Time=12:03:11,Action=Failure SessionID=Abcd,Time=12:04:19,Action=Failure SessionID=m,Time=12:05:49,Action=Failure SessionID=XXXXX,Time=12:06:20,Action=Failure SessionID=XXXXX,Time=12:07:34,Action=Failure SessionID=Abcd,Time=12:10:11,Action=Failure SessionID=Abcd,Time=12:23:12,Action=Success SessionID=ZZ,Time=12:28:10,Action=Success SessionID=XXXXX,Time=12:31:00,Action=Failure SessionID=Abcd,Time=21:03:11,Action=Success SessionID=m,Time=22:03:11,Action=Failure SessionID=m,Time=22:03:12,Action=Success"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| streamstats count
| eval Time = strptime(Time, "%H:%M:%S")
| fields - count
| kv

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| stats list(Action) AS Action list(Time) AS Time count(eval(Action="Success")) AS SuccessCount count(eval(Action="Failure")) AS FailureCount BY SessionID
| eval category=case(
   ((SuccessCount==0) AND (FailureCount>0)), "FailureNoSuccess",
   ((SuccessCount>0) AND (FailureCount==0)), "Success",
   ((SuccessCount>0) AND (FailureCount>0)), "FailureThenSuccess",
   true(), "UNKNOWN")
| eval foo="bar"
| chart dc(SessionID) BY foo category
| table FailureNoSuccess Success FailureThenSuccess
0 Karma