I have a list of events which are watermarks for customer activities. The data look like this:
Date/Time Customer Watermark
07/12/16 09:00 CustomerA 1000
07/12/16 10:00 CustomerA 5000
07/12/16 11:00 CustomerA 8000
07/12/16 18:00 CustomerB 10000
07/12/16 19:00 CustomerB 12000
I need to get the max activity per hour for each customer, so my result set looks like this:
Customer Act/Hour
CustomerA 4000
CustomerB 2000
I used streamstats to get the difference and calculate the activities, but when I try and use stats to get the max of those, streamstats reprocesses the events and messes up the result set. Here’s my code:
| streamstats current=f window=2 last(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer
I tried dedup without any luck either. I even tried to join the query to itself, like so:
| join customer_dbname [search index=mysql
| streamstats current=f window=2 last(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| fields customer ActivityCount]
| stats max(ActivityCount) as MaxCount by customer
I would have thought the usetime and earlier options would have made this work, but no luck here either - ActivityCount came back as blank.
Since the customer timing is variable, I need to run this over a large range or else I’ll miss customers.
Any thoughts/suggestions? Thanks!
The answers here spurred me to look further into the raw data, and I realized the results that disappeared when I ran for a longer time range were the ones with multiple values. That should have been handled by the window setting, but apparently it causes issues. On a lark I added dedup before the streamstats, like this:
| dedup 2 customer sortby watermark
| sort customer watermark
| streamstats current=f window=1 values(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer | sort -MaxCount
This worked in one sense It gave a much higher number of results when I ran the query for a longer time period. But when I drilled into the data more, about a third of them weren't showing the true max. As I feared, putting dedup before the streamstats command just reduced the results to the most recent two.
Long story short, streamstats and stats just don't play nice.
Give this a try
your base search | sort _time | streamstats current=f window=1 value(watermark) as last_watermark by customer
| eval ActivityCount = watermark - last_watermark
| stats max(ActivityCount) as MaxCount by customer
The sixteen hour search is still returning more than the eight hour search. Thanks though.
Try this
.... | streamstats window=2 range(Watermark) as ActivityCount by Customer | stats max(ActivityCount) as MaxCount by Customer
It's much more elegant code, but it still suffers from the same reprocessing problem. The way I can see this is that if I run the query with earliest=-24h I get fewer results than if I run it with earliest=-8h. Very weird.
I may end up just doing several runs with a Date & Time Range of eight hours.
Thanks for replying.