Splunk Search

how to calculate a running average of events by user for 'All time' (Part 2)

Log_wrangler
Builder

I am looking for a way to compare an hourly ave(count) with the All time historic average.

Below is a sample query provided by FrankVl.

  index=some_db sourcetype=syslog_transactions
      |bin _time span=1h
      | stats count as hourly_count by USER, _time
         | sort _time
      | streamstats avg(hourly_count) as historic_avg by USER

This works fine for last 24 hours; but now I want to look at the last 24 hours "hourly_count" and compare it to an 'All time' running ave(hourly_count).

My overall objective is to create a behavioral detection where I can also set a threshold to see when transactions spike up.

For example, if user X has a normal range of transactions every hour (between 0-100/hr) and suddenly user X jumps up 10x the normal like 1k transactions/hr, then I want to be able to find this based on a calculation against the normal average for all time.

And then I want to exclude the hourly_counts that are abnormal (like 1000/hr vs 10/hr , 10/hr being normal) from the 'All time' average.

I reworked the above query and I can change the WHERE statement. But I feel like I am reinventing a wheel, as I am sure someone has probably needed a similar solution.

index=some_db sourcetype=syslog_transactions
       |bin _time span=1h
       | stats count as hourly_count by USER, _time
       | sort _time
       | streamstats avg(hourly_count) sum(hourly_count) by USER
       | WHERE hourly_count > (10*hourly_avg)

Please advise if there is a better way to do this.

Thank you

Tags (2)
1 Solution

DalJeanis
Legend

First, run this weekly for all time to set up your lookup.

 index=some_db sourcetype=syslog_transactions
 latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | bin _time as Day 
   | eventstats count as active_hours by USER, Day
   | stats avg(hourly_count) as historic_avg 
        stdev(hourly_count) as historic_stdev  
        p95(hourly_count) as historic_p95  
        max(hourly_count) as historic_max 
        avg(active_hours) as active_avg
        max(active_hours) as active_max
        by USER
   | outputlookup  append=f  myUserHistory.csv

Next, here is your alert.

 index=some_db sourcetype=syslog_transactions
 earliest=-1d@d latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | lookup myUserHistory.csv USER OUTPUT  historic_avg  historic_stdev  historic_p95 historic_max
   | where  hourly_count > historic_avg + 2 * historic_stdev

You can play with the stdev multiplier in the last line, or use the p95 value or max value.

You will notice that I've added a historical count of daily active hours. That's just in case you'd like to alert if some process is hitting your system under a person's id around the clock.

View solution in original post

woodcock
Esteemed Legend

This kind of thing is why summary index is still a thing; you should invest in setting up one for this use case:

http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing

Log_wrangler
Builder

Thank you for your reply, I will research this.

0 Karma

Log_wrangler
Builder

Summary index is part of the solution for sure, thank you.

woodcock
Esteemed Legend

That is what Up-voting is for!

0 Karma

Log_wrangler
Builder

Well, ok, but in all fairness, just referencing a doc... in comparison to the other responses does not warrant it. The others are providing solutions that don't require summary indexes. If you provide an example of summary index with comments, then I would accept your answer as the best. Or if you wish I will create a new question that you could answer? Thank you

0 Karma

DalJeanis
Legend

First, run this weekly for all time to set up your lookup.

 index=some_db sourcetype=syslog_transactions
 latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | bin _time as Day 
   | eventstats count as active_hours by USER, Day
   | stats avg(hourly_count) as historic_avg 
        stdev(hourly_count) as historic_stdev  
        p95(hourly_count) as historic_p95  
        max(hourly_count) as historic_max 
        avg(active_hours) as active_avg
        max(active_hours) as active_max
        by USER
   | outputlookup  append=f  myUserHistory.csv

Next, here is your alert.

 index=some_db sourcetype=syslog_transactions
 earliest=-1d@d latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | lookup myUserHistory.csv USER OUTPUT  historic_avg  historic_stdev  historic_p95 historic_max
   | where  hourly_count > historic_avg + 2 * historic_stdev

You can play with the stdev multiplier in the last line, or use the p95 value or max value.

You will notice that I've added a historical count of daily active hours. That's just in case you'd like to alert if some process is hitting your system under a person's id around the clock.

Log_wrangler
Builder

Thank you for your reply, I will test it ok.

0 Karma

Log_wrangler
Builder

Your answer has provide a lot to think about but I had to rework it to display the stats view instead of just events.

Unfortunately running an "All time" weekly report is not feasible, there is too much data. In this case Woodcock provided the answer to calculate "running average".

I believe that a summary index is my best option in combination with your query.

What I was really after was a historical count by hour and user, (as hourly count by user), then compare the average of the historical hourly count by user to the current hourly count within the last 24 hours, where the current hourly count is greater (by some power TBD) than historic hourly average.

0 Karma

DalJeanis
Legend

Hmmm. That's very strange. Did you say that backwards? Since both of my queries use the stats command, the events themselves would have been aggregated before output.

You could use this first part, over all time, or over the past 90 days ( earliest=-90d@d latest=@d ), or whatever, to create data to collect to your initial summary index.

  index=some_db sourcetype=syslog_transactions
  latest=@d
    | bin _time span=1h
    | stats count as hourly_count by USER, _time

After that, you run it daily as above ( earliest=-1d@d latest=@d ) to update with the prior day's info, and then the following to create that day's lookup as per the prior post.

index=yoursummaryindex
| bin _time as Day 
| eventstats count as active_hours by USER, Day
| stats avg(hourly_count) as historic_avg 
     stdev(hourly_count) as historic_stdev  
     p95(hourly_count) as historic_p95  
     max(hourly_count) as historic_max 
     avg(active_hours) as active_avg
     max(active_hours) as active_max
     by USER
| outputlookup  append=f  myUserHistory.csv
0 Karma

Sukisen1981
Champion

Hi,

You say - "For example, if user X has a normal range of transactions every hour (between 0-100/hr) and suddenly user X jumps up 10x the normal like 1k transactions/hr, then I want to be able to find this based on a calculation against the normal average for all time."
Now, if you do want the all time average and sum by user, you have to run a sub search to get the same. Any search running on an all time range is expensive by nature.
For example append this eventstats after your streamstats and remove your where condition. Test the out by providing any valid user name that returns results by adding a where condition
eventstats avg(hourly_count) as x,sum(hourly_count) as y by USER | where USER="XXX"
Do you really want to go back to the beginning of time (all time), is it perhaps possible to run the eventstats over say last 1 week or so in a sub search and have that as a baseline. You will always need a last where command to identify events over the threshold returned by the eventstas

0 Karma

Log_wrangler
Builder

Thank you, I am reviewing your use of eventstats vs streamstats.

0 Karma

Log_wrangler
Builder

I have looked over your response and it is a variation of what I was originally thinking, and I think it is one way to go with this but as you mention very costly. So I probably won't this option for my situation. However, thank you very much for your insight.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...