Splunk Search

How can you use max with streamstats?

Kenshiro70
Path Finder

I have a list of events which are watermarks for customer activities. The data look like this:

Date/Time      Customer  Watermark
07/12/16 09:00 CustomerA      1000
07/12/16 10:00 CustomerA      5000
07/12/16 11:00 CustomerA      8000
07/12/16 18:00 CustomerB     10000
07/12/16 19:00 CustomerB     12000

I need to get the max activity per hour for each customer, so my result set looks like this:

Customer   Act/Hour
CustomerA      4000
CustomerB      2000

I used streamstats to get the difference and calculate the activities, but when I try and use stats to get the max of those, streamstats reprocesses the events and messes up the result set. Here’s my code:

| streamstats current=f window=2 last(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer

I tried dedup without any luck either. I even tried to join the query to itself, like so:

| join customer_dbname [search index=mysql
  | streamstats current=f window=2 last(watermark) as last_watermark by customer
  | eval ActivityCount = watermark - last_watermark
  | fields customer ActivityCount]
| stats max(ActivityCount) as MaxCount by customer

I would have thought the usetime and earlier options would have made this work, but no luck here either - ActivityCount came back as blank.

Since the customer timing is variable, I need to run this over a large range or else I’ll miss customers.

Any thoughts/suggestions? Thanks!

Tags (1)
0 Karma

Kenshiro70
Path Finder

The answers here spurred me to look further into the raw data, and I realized the results that disappeared when I ran for a longer time range were the ones with multiple values. That should have been handled by the window setting, but apparently it causes issues. On a lark I added dedup before the streamstats, like this:

| dedup 2 customer sortby watermark
| sort customer watermark
| streamstats current=f window=1 values(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer | sort -MaxCount

This worked in one sense It gave a much higher number of results when I ran the query for a longer time period. But when I drilled into the data more, about a third of them weren't showing the true max. As I feared, putting dedup before the streamstats command just reduced the results to the most recent two.

Long story short, streamstats and stats just don't play nice.

0 Karma

somesoni2
Revered Legend

Give this a try

your base search | sort _time  | streamstats current=f window=1 value(watermark) as last_watermark by customer
 | eval ActivityCount = watermark - last_watermark
 | stats max(ActivityCount) as MaxCount by customer
0 Karma

Kenshiro70
Path Finder

The sixteen hour search is still returning more than the eight hour search. Thanks though.

0 Karma

sundareshr
Legend

Try this

.... | streamstats window=2 range(Watermark) as ActivityCount by Customer | stats max(ActivityCount) as MaxCount by Customer
0 Karma

Kenshiro70
Path Finder

It's much more elegant code, but it still suffers from the same reprocessing problem. The way I can see this is that if I run the query with earliest=-24h I get fewer results than if I run it with earliest=-8h. Very weird.

I may end up just doing several runs with a Date & Time Range of eight hours.

Thanks for replying.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...