Splunk Search

How can you use max with streamstats?

Kenshiro70
Path Finder

I have a list of events which are watermarks for customer activities. The data look like this:

Date/Time      Customer  Watermark
07/12/16 09:00 CustomerA      1000
07/12/16 10:00 CustomerA      5000
07/12/16 11:00 CustomerA      8000
07/12/16 18:00 CustomerB     10000
07/12/16 19:00 CustomerB     12000

I need to get the max activity per hour for each customer, so my result set looks like this:

Customer   Act/Hour
CustomerA      4000
CustomerB      2000

I used streamstats to get the difference and calculate the activities, but when I try and use stats to get the max of those, streamstats reprocesses the events and messes up the result set. Here’s my code:

| streamstats current=f window=2 last(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer

I tried dedup without any luck either. I even tried to join the query to itself, like so:

| join customer_dbname [search index=mysql
  | streamstats current=f window=2 last(watermark) as last_watermark by customer
  | eval ActivityCount = watermark - last_watermark
  | fields customer ActivityCount]
| stats max(ActivityCount) as MaxCount by customer

I would have thought the usetime and earlier options would have made this work, but no luck here either - ActivityCount came back as blank.

Since the customer timing is variable, I need to run this over a large range or else I’ll miss customers.

Any thoughts/suggestions? Thanks!

Tags (1)
0 Karma

Kenshiro70
Path Finder

The answers here spurred me to look further into the raw data, and I realized the results that disappeared when I ran for a longer time range were the ones with multiple values. That should have been handled by the window setting, but apparently it causes issues. On a lark I added dedup before the streamstats, like this:

| dedup 2 customer sortby watermark
| sort customer watermark
| streamstats current=f window=1 values(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer | sort -MaxCount

This worked in one sense It gave a much higher number of results when I ran the query for a longer time period. But when I drilled into the data more, about a third of them weren't showing the true max. As I feared, putting dedup before the streamstats command just reduced the results to the most recent two.

Long story short, streamstats and stats just don't play nice.

0 Karma

somesoni2
Revered Legend

Give this a try

your base search | sort _time  | streamstats current=f window=1 value(watermark) as last_watermark by customer
 | eval ActivityCount = watermark - last_watermark
 | stats max(ActivityCount) as MaxCount by customer
0 Karma

Kenshiro70
Path Finder

The sixteen hour search is still returning more than the eight hour search. Thanks though.

0 Karma

sundareshr
Legend

Try this

.... | streamstats window=2 range(Watermark) as ActivityCount by Customer | stats max(ActivityCount) as MaxCount by Customer
0 Karma

Kenshiro70
Path Finder

It's much more elegant code, but it still suffers from the same reprocessing problem. The way I can see this is that if I run the query with earliest=-24h I get fewer results than if I run it with earliest=-8h. Very weird.

I may end up just doing several runs with a Date & Time Range of eight hours.

Thanks for replying.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...