Splunk Search

Parallel stats - most efficient structure

JeToJedno
Explorer

I frequently have to create stats reports where some parts are, essentially, executable in parallel with others. An example would be:

search <something> 
  | appendpipe [ eval _time=FLOOR(_time/60)*60 | stats DC(id) unique_devices_minute BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 | stats DC(id) unique_devices_hour BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/86400)*86400 | stats DC(id) unique_devices_day BY device_type, _time ]
  | eval _time=FLOOR(_time/60)*60
  | stats <<some summaries>> SUM(unique_devices_minute) AS unique_devices_minute, SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
  | eval period="minute"
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 
    | stats <<sum per-minute to per-hour>> SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="hour" ]
  | appendpipe [ where period="hour" | eval _time=FLOOR(_time/86400)*86400 
    | stats <<sum per-hour to per-day>>  SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="day" ]
  | eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour,eval period="day",unique_devices_day ) | fields - unique_devices_minute, unique_devices_hour, unique_devices_day 

This gives the results I want in a single report, but is it the most efficient way to structure this?

0 Karma

cmerriman
Super Champion

you could do something like this:

|multireport
[|bucket _time span=1min| stats DC(id) as unique_devices_minute BY device_type _time|eval period="minute"]
[|bucket _time span=1h| stats DC(id) as unique_devices_hour BY device_type _time|eval period="hour"]
[|bucket _time span=1d| stats DC(id) as unique_devices_day BY device_type _time|eval period="day"]
| eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour, period="day",unique_devices_day )
| fields - unique_devices_minute, unique_devices_hour, unique_devices_day

I see that you're doing some sort of summing by minute/hour/day before you eval unique_devices in separate appendpipes, but i'm not entirely sure what that's doing in the end. some sample data and desired output might be more helpful.

0 Karma

JeToJedno
Explorer

The other stats calculated are always average, peak & maximum request rates, and sometimes first-seen-time (in day) unique devices count. Peak rate = max of 2 sec moving average request rate or a close approximation (e.g. 98th percentile within minute of per-second request rate, and then max of per-minute 98th percentile for hour & day).

I was hoping to avoid making multiple independent passes through the whole dataset, but I can see now that's probably not going to be possible as the count distinct over a different time period will always require a separate pass through the base data. 😞

0 Karma

JeToJedno
Explorer

Before anyone says anything, the first appendpipe (line 2) is unnecessary and can be part of the stats command in line 6.

... and there's a spurious "eval" in line 14. 😞

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...