How to use streamstats to evaluate and filter a ro...

brunton2 · ‎11-11-2016

I'm looking for a way to filter search results based on calculating time deltas between 2 rows (goal is to extract contiguous events based on 0 or greater time delta from the end of one event to the start of the next). The problem is that I need to evaluate each row pair and filter in a single operation and then repeat for the entire data series. The only way I have been able to achieve this so far is to use searchstats, and filter out a row at time and rerun the streamstats until the result set no longer reduces. Is there a better way?

Example data set (in reverse chronological 'end time' order):

Event           Start_Time          End_Time        Delta
Event_1         13:10:00            13:20:00        -
Event_2         13:07:00            13:15:00        -5:00
Event_3         13:06:00            13:14:00        -7:00
Event_4         13:00:00            13:10:00        -4:00
Event_5         12:50:00            13:00:00        0

Desired Output

Event           Start_Time          End_Time        Delta
Event_1         13:10:00            13:20:00        -
Event_4         13:00:00            13:10:00        0
Event_5         12:50:00            13:00:00        0

Actual Output

Event           Start_Time          End_Time        Delta
Event_1         13:10:00            13:20:00        -
Event_5         12:50:00            13:00:00        0

Original Query

<query> | streamstats current=f window=1 global=f last(Start_Time) as Next_Start | while (Next_Start - End_Time) >= 0
-- Next_Start is the 'Start_Time' from the first row, and End_Time is from the second row

Problem is that this filters out Event 4 because it evaluates the entire data set in one operation before evaluating the filter

Current Working (but inefficient) Query, and I don't really know how many times to call the filter (although strangely enough it doesn't seem to cause a significant time impact even with 30+ calls!)

<query> | streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time   | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0 
| streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time   | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0  
| streamstats current=f window=1 global=f last(Start_Time) as Next_Start | eval diff = Next_Start - End_Time   | streamstats current=f window=1 global=f last(diff) as prev_diff | eval diff = if(diff<0 AND prev_diff<0,0,diff) | search diff >= 0 
... <repeat until data set doesn't reduce any further>
-- Only consider the first two rows with a negative result.  If there are contiguous rows with a negative result zero out all but the first and let the next searchstats call filter them one by one.

Any suggestions would be greatly appreciated. Thanks

somesoni2 · ‎11-13-2016

Give this a try (Run anywhere search with your sample data, everything before the convert command is just to generate sample data, replace it with you actual search )

| gentimes start=-1 | eval temp="Event_1            13:10:00            13:20:00#Event_2            13:07:00            13:15:00#Event_3            13:06:00            13:14:00#Event_4            13:00:00            13:10:00#Event_5            12:50:00            13:00:00" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Event>\S+)\s+(?<Start_Time>\S+)\s+(?<End_Time>\S+)" | fields - temp 
| convert dur2sec(End_Time) as s_End dur2sec(Start_Time) as s_Start | streamstats current=f window=1 values(s_Start) as next_start | eval delta1=next_start-s_End | reverse | streamstats current=f window=1 values(s_End) as prev_end | eval delta2=s_Start-prev_end | where coalesce(delta1,0)=0 OR coalesce(delta2,0)=0 | reverse | table Event Start_Time End_Time

brunton2 · ‎11-14-2016

Thanks for this. While it didn't solve my problem exactly (as it only has a 1 element lookahead, before and after) it did put me on a better path. I've ended up with the following which does a window lookahead, before and after, to determine if there is are contiguous events. If there isn't (and it isn't the start or end event) it eliminates it.

| streamstats current=f window=30 values(LAST_START) as next_start | eval next_start = if(isnull(next_start),LAST_END,next_start)  | reverse | streamstats current=f window=30 values(LAST_END) as prev_end | eval prev_end = if(isnull(prev_end),LAST_START,prev_end) | mvexpand next_start | eval delta1=next_start-LAST_END | mvexpand prev_end | eval delta2=LAST_START-prev_end| where coalesce(delta1,0)=0 AND coalesce(delta2,0)=0 | reverse

aaraneta_splunk · ‎11-29-2016

Hi @brunton2 - Did the search you wrote above provide a working solution to your question?

brunton2 · ‎11-29-2016

Yes and no unfortunately. Yes from the perspective that it did what I wanted. No from the perspective that the performance of mvexpand makes the solution unbearably slow in the large data set scenario I'm trying to solve.

What I'd really like to do is perform an mvfind with a numeric comparison against each element in the mvlist. Unfortunately the comparison needs to use the LAST_START and LAST_END variables which mvfind appears unable to support.

What I've resorted to doing is extracting each of the mvlist elements as eval variables and comparing against each in sequence. The performance of the query is orders of magnitude faster than the mvexpand approach but it makes the query horrific to look at!

Unless someone can suggest a method to either optimizing mvexpand over large data sets or a method to use mvfind with a variable in the comparison routine then I think I'm stuck with this approach

aaraneta_splunk · ‎12-05-2016

No problem. We'll leave this open just in case another user would like to take a stab at it.

How to use streamstats to evaluate and filter a row at a time?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

How to use streamstats to evaluate and filter a row at a time?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...