Splunk Search

Parallel stats - most efficient structure

JeToJedno
Explorer

I frequently have to create stats reports where some parts are, essentially, executable in parallel with others. An example would be:

search <something> 
  | appendpipe [ eval _time=FLOOR(_time/60)*60 | stats DC(id) unique_devices_minute BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 | stats DC(id) unique_devices_hour BY device_type, _time ]
  | appendpipe [ eval _time=FLOOR(_time/86400)*86400 | stats DC(id) unique_devices_day BY device_type, _time ]
  | eval _time=FLOOR(_time/60)*60
  | stats <<some summaries>> SUM(unique_devices_minute) AS unique_devices_minute, SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
  | eval period="minute"
  | appendpipe [ eval _time=FLOOR(_time/3600)*3600 
    | stats <<sum per-minute to per-hour>> SUM(unique_devices_hour) AS unique_devices_hour, SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="hour" ]
  | appendpipe [ where period="hour" | eval _time=FLOOR(_time/86400)*86400 
    | stats <<sum per-hour to per-day>>  SUM(unique_devices_day) AS unique_devices_day BY device_type, _time
    | eval period="day" ]
  | eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour,eval period="day",unique_devices_day ) | fields - unique_devices_minute, unique_devices_hour, unique_devices_day 

This gives the results I want in a single report, but is it the most efficient way to structure this?

0 Karma

cmerriman
Super Champion

you could do something like this:

|multireport
[|bucket _time span=1min| stats DC(id) as unique_devices_minute BY device_type _time|eval period="minute"]
[|bucket _time span=1h| stats DC(id) as unique_devices_hour BY device_type _time|eval period="hour"]
[|bucket _time span=1d| stats DC(id) as unique_devices_day BY device_type _time|eval period="day"]
| eval unique_devices=CASE(period="minute",unique_devices_minute,period="hour",unique_devices_hour, period="day",unique_devices_day )
| fields - unique_devices_minute, unique_devices_hour, unique_devices_day

I see that you're doing some sort of summing by minute/hour/day before you eval unique_devices in separate appendpipes, but i'm not entirely sure what that's doing in the end. some sample data and desired output might be more helpful.

0 Karma

JeToJedno
Explorer

The other stats calculated are always average, peak & maximum request rates, and sometimes first-seen-time (in day) unique devices count. Peak rate = max of 2 sec moving average request rate or a close approximation (e.g. 98th percentile within minute of per-second request rate, and then max of per-minute 98th percentile for hour & day).

I was hoping to avoid making multiple independent passes through the whole dataset, but I can see now that's probably not going to be possible as the count distinct over a different time period will always require a separate pass through the base data. 😞

0 Karma

JeToJedno
Explorer

Before anyone says anything, the first appendpipe (line 2) is unnecessary and can be part of the stats command in line 6.

... and there's a spurious "eval" in line 14. 😞

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...