<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to calculate a running average of events by user for 'All time' (Part 2) in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416994#M119996</link>
    <description>&lt;P&gt;I am looking for a way to compare an hourly ave(count) with the All time historic average.&lt;/P&gt;

&lt;P&gt;Below is a sample query provided by FrankVl.  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  index=some_db sourcetype=syslog_transactions
      |bin _time span=1h
      | stats count as hourly_count by USER, _time
         | sort _time
      | streamstats avg(hourly_count) as historic_avg by USER
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This works fine for last 24 hours; but now I want to look at the last 24 hours "hourly_count" and compare it to an 'All time'  running ave(hourly_count).&lt;/P&gt;

&lt;P&gt;My overall objective is to create a behavioral detection where I can also set a threshold to see when transactions spike up.&lt;/P&gt;

&lt;P&gt;For example, if user X has a normal range of transactions every hour  (between 0-100/hr) and suddenly user X jumps up 10x the normal like 1k transactions/hr,  then I want to be able to find this based on a calculation against the normal average for all time.    &lt;/P&gt;

&lt;P&gt;And then I want to exclude the hourly_counts that are abnormal (like 1000/hr vs 10/hr , 10/hr being normal) from the 'All time' average.&lt;/P&gt;

&lt;P&gt;I reworked the above query and I can change the WHERE statement.  But I feel like I am reinventing a wheel, as I am sure someone has probably needed a similar solution.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=some_db sourcetype=syslog_transactions
       |bin _time span=1h
       | stats count as hourly_count by USER, _time
       | sort _time
       | streamstats avg(hourly_count) sum(hourly_count) by USER
       | WHERE hourly_count &amp;gt; (10*hourly_avg)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Please advise if there is a better way to do this.&lt;/P&gt;

&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 20:14:47 GMT</pubDate>
    <dc:creator>Log_wrangler</dc:creator>
    <dc:date>2020-09-29T20:14:47Z</dc:date>
    <item>
      <title>how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416994#M119996</link>
      <description>&lt;P&gt;I am looking for a way to compare an hourly ave(count) with the All time historic average.&lt;/P&gt;

&lt;P&gt;Below is a sample query provided by FrankVl.  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  index=some_db sourcetype=syslog_transactions
      |bin _time span=1h
      | stats count as hourly_count by USER, _time
         | sort _time
      | streamstats avg(hourly_count) as historic_avg by USER
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This works fine for last 24 hours; but now I want to look at the last 24 hours "hourly_count" and compare it to an 'All time'  running ave(hourly_count).&lt;/P&gt;

&lt;P&gt;My overall objective is to create a behavioral detection where I can also set a threshold to see when transactions spike up.&lt;/P&gt;

&lt;P&gt;For example, if user X has a normal range of transactions every hour  (between 0-100/hr) and suddenly user X jumps up 10x the normal like 1k transactions/hr,  then I want to be able to find this based on a calculation against the normal average for all time.    &lt;/P&gt;

&lt;P&gt;And then I want to exclude the hourly_counts that are abnormal (like 1000/hr vs 10/hr , 10/hr being normal) from the 'All time' average.&lt;/P&gt;

&lt;P&gt;I reworked the above query and I can change the WHERE statement.  But I feel like I am reinventing a wheel, as I am sure someone has probably needed a similar solution.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=some_db sourcetype=syslog_transactions
       |bin _time span=1h
       | stats count as hourly_count by USER, _time
       | sort _time
       | streamstats avg(hourly_count) sum(hourly_count) by USER
       | WHERE hourly_count &amp;gt; (10*hourly_avg)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Please advise if there is a better way to do this.&lt;/P&gt;

&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 20:14:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416994#M119996</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2020-09-29T20:14:47Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416995#M119997</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;You say - "For example, if user X has a normal range of transactions every hour (between 0-100/hr) and suddenly user X jumps up 10x the normal like 1k transactions/hr, then I want to be able to find this based on a calculation against the normal average for all time."&lt;BR /&gt;
Now, if you do want the all time average and sum by user, you have to run a sub search to get the same. Any search running on an all time range is expensive by nature.&lt;BR /&gt;
For example append this eventstats after your streamstats  and remove your  where condition. Test the out by providing any valid user name that returns results by adding a where condition&lt;BR /&gt;
    eventstats avg(hourly_count) as x,sum(hourly_count) as y by  USER | where USER="XXX"&lt;BR /&gt;
Do you really want to go back to the beginning of time (all time), is it perhaps possible to run the eventstats over say last 1 week or so in a sub search and have that as a baseline. You will always need a last where command to identify events over the threshold returned by the eventstas&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 20:15:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416995#M119997</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2020-09-29T20:15:00Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416996#M119998</link>
      <description>&lt;P&gt;First, run this weekly for all time to set up your lookup. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=some_db sourcetype=syslog_transactions
 latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | bin _time as Day 
   | eventstats count as active_hours by USER, Day
   | stats avg(hourly_count) as historic_avg 
        stdev(hourly_count) as historic_stdev  
        p95(hourly_count) as historic_p95  
        max(hourly_count) as historic_max 
        avg(active_hours) as active_avg
        max(active_hours) as active_max
        by USER
   | outputlookup  append=f  myUserHistory.csv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Next, here is your alert.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=some_db sourcetype=syslog_transactions
 earliest=-1d@d latest=@d
   | bin _time span=1h
   | stats count as hourly_count by USER, _time
   | lookup myUserHistory.csv USER OUTPUT  historic_avg  historic_stdev  historic_p95 historic_max
   | where  hourly_count &amp;gt; historic_avg + 2 * historic_stdev
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You can play with the stdev multiplier in the last line, or use the p95 value  or max value.&lt;/P&gt;

&lt;P&gt;You will notice that I've added a historical count of daily active hours.  That's just in case you'd like to alert if some process is hitting your system under a person's id around the clock.  &lt;/P&gt;</description>
      <pubDate>Sat, 30 Jun 2018 19:42:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416996#M119998</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2018-06-30T19:42:56Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416997#M119999</link>
      <description>&lt;P&gt;This kind of thing is why &lt;CODE&gt;summary index&lt;/CODE&gt; is still a thing; you should invest in setting up one for this use case:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing" target="test_blank"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing&lt;/A&gt;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 01 Jul 2018 00:11:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416997#M119999</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2018-07-01T00:11:49Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416998#M120000</link>
      <description>&lt;P&gt;Thank you, I am reviewing your use of eventstats vs streamstats.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2018 16:30:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416998#M120000</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-03T16:30:57Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416999#M120001</link>
      <description>&lt;P&gt;Thank you for your reply, I will research this.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2018 16:31:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/416999#M120001</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-03T16:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417000#M120002</link>
      <description>&lt;P&gt;Thank you for your reply, I will test it ok. &lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2018 16:31:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417000#M120002</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-03T16:31:59Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417001#M120003</link>
      <description>&lt;P&gt;Your answer has provide a lot to think about but I had to rework it to display the stats view instead of just events.&lt;/P&gt;

&lt;P&gt;Unfortunately running an "All time" weekly report is not feasible, there is too much data.  In this case Woodcock  provided the answer to calculate "running average".&lt;/P&gt;

&lt;P&gt;I believe that a summary index is my best option in combination with your query.&lt;/P&gt;

&lt;P&gt;What I was really after was a historical count by hour and user, (as hourly count by user), then compare the average of the historical hourly count by user to the current hourly count within the last 24 hours, where the current hourly count is greater (by some power TBD)  than historic hourly average.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2018 14:21:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417001#M120003</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-06T14:21:11Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417002#M120004</link>
      <description>&lt;P&gt;Summary index is part of the solution for sure, thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2018 14:22:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417002#M120004</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-06T14:22:29Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417003#M120005</link>
      <description>&lt;P&gt;That is what &lt;CODE&gt;Up-voting&lt;/CODE&gt; is for!&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2018 15:27:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417003#M120005</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2018-07-06T15:27:37Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417004#M120006</link>
      <description>&lt;P&gt;Hmmm.  That's very strange. Did you say that backwards?  Since both of my queries use the stats command, the events themselves would have been aggregated before output.&lt;/P&gt;

&lt;P&gt;You could use this first part, over all time, or over the past 90 days  ( &lt;CODE&gt;earliest=-90d@d latest=@d&lt;/CODE&gt; ), or whatever,  to create data to &lt;CODE&gt;collect&lt;/CODE&gt; to your initial summary index.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  index=some_db sourcetype=syslog_transactions
  latest=@d
    | bin _time span=1h
    | stats count as hourly_count by USER, _time
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;After that, you run it daily as above ( &lt;CODE&gt;earliest=-1d@d latest=@d&lt;/CODE&gt; ) to update with the prior day's info, and then the following to create that day's lookup as per the prior post.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=yoursummaryindex
| bin _time as Day 
| eventstats count as active_hours by USER, Day
| stats avg(hourly_count) as historic_avg 
     stdev(hourly_count) as historic_stdev  
     p95(hourly_count) as historic_p95  
     max(hourly_count) as historic_max 
     avg(active_hours) as active_avg
     max(active_hours) as active_max
     by USER
| outputlookup  append=f  myUserHistory.csv
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 08 Jul 2018 17:33:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417004#M120006</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2018-07-08T17:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417005#M120007</link>
      <description>&lt;P&gt;Well, ok, but in all fairness, just referencing a doc... in comparison to the other responses does not warrant it.     The others are providing solutions that don't require summary indexes.  If you provide an example of summary index with comments, then I would accept your answer as the best.  Or if you wish I will create a new question that you could answer?   Thank you&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jul 2018 13:21:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417005#M120007</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-09T13:21:36Z</dc:date>
    </item>
    <item>
      <title>Re: how to calculate a running average of events by user for 'All time' (Part 2)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417006#M120008</link>
      <description>&lt;P&gt;I have looked over your response and it is a variation of what I was originally thinking, and I think it is one way to go with this but as you mention very costly.   So I probably won't this option for my situation.  However, thank you very much for your insight.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jul 2018 13:23:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/how-to-calculate-a-running-average-of-events-by-user-for-All/m-p/417006#M120008</guid>
      <dc:creator>Log_wrangler</dc:creator>
      <dc:date>2018-07-09T13:23:48Z</dc:date>
    </item>
  </channel>
</rss>

