Warning: Splunk noob question.
I have a base search:
source="Administrator_logs" name="An account failed to log on"
Using https://community.splunk.com/t5/Splunk-Search/Getting-Average-Number-of-Requests-Per-Hour/m-p/73506 I can calculate hourly averages:
source="Administrator_logs"name="An account failed to log on" | eval reqs = 1 | timechart span=1h per_hour(reqs) as AvgReqPerHour
What I would like to do is calculate a baseline. Having never done this before my thought is to calculate the hourly average and either standard deviation and/or some percentile, e.g. 90th, for all events as apposed to the last day/week/month although that would be interesting too.
Eventually, this baseline calculation will be the basis for an alert, e.g. create alert if hourly count is outside 1 stddev or 90th percentile.
Q1: How do I calculate the hourly average for all events?
Q2: How do I calculate the hourly standard deviation for all events?
Q3: How do I calculate the hourly 90th percentile for all events?
This assumes your data is normally distributed. If it is not, you may need to transform your data before calculating statistics.
The timechart count aggregation should be sufficient for counting by hour.
Following that, you can extract the hour from _time and use the stats command to calculate the average, standard deviation, and 90th percentile by hour.
Here's an example using random counts:
| makeresults count=10000
| eval _time=_time-_time%3600-604800*random()/2147483647 ```uniformly distributed over 7 days```
| timechart fixedrange=f span=1h count
| eval date_hour=strftime(_time, "%H")
| stats avg(count) as avg_count stdev(count) as sd_count p90(count) as p90_count by date_hour
Using your source:
source="Administrator_logs" name="An account failed to log on" earliest=-7d@h latest=@h
| timechart span=1h count
| eval date_hour=strftime(_time, "%H")
| stats avg(count) as avg_count stdev(count) as sd_count p90(count) as p90_count by date_hour