Splunk Search

How to calculate limits for agents based on their activity?

exmuzzy
Explorer

I want to receive notifications if agents lower or exceed their normal activity for the current day of the week and hour.
I'm going to achieve this by creating a daily lookup table with the counted perc90 number of events per day of the week / hour

The search to create daily lookup table looks like this

index=api* | stats count(eval(result=="OK")) as success, count(eval(result!="OK")) as fail by agent, method, date_month, date_mday, date_wday, date_hour, date_minute | stats min(success) as min_success, min(fail) as min_fail, perc90(success) as perc90_success, perc90(fail) as perc90_fail, max(success) as max_success, max(fail) as max_fail by agent, method, date_wday, date_hour, date_minute | strcat agent "@" method "@" date_wday date_hour ":" date_minute key | fields key, min_success, min_fail, perc90_success, perc90_fail, max_success, min_fail | outputlookup agent_limits append=False key_field=key

But now I'm in doubt how to move on.
How to get overview filling that agents activity in normal ranges?
should it be alert or dashboard or pivot?

how to make a correction manually, if the agent has gone beyond the ranges of his normal activity, but this is the same norm or just growth over time?

0 Karma

DalJeanis
Legend

You're a bit deep into the weeds. The first thing you need to do is understand your data.

What you have there is the minimum, maximum, and 90th percentile of activity for that particular hour and minute of that day of the week. Obviously, I don't know your data, but is that really a reasonable representation of how your data changes? Is 4 AM on Monday that much different from 4 AM on Tuesday? Is 4:03 AM on Wednesday really that different from 4:05 AM on Wednesday?

First, run this and look at the results. What is the shape of your data?

index = api*  agent=* method=*
| bin _time span=1m
| stats count(eval(result=="OK")) as success, count(eval(result!="OK")) as fail by agent, method, date_wday, date_hour, date_minute, _time
| eval total= success+fail
| eval good = round(100*success/total,0)
| eval bad = round(100*fail/total,0)
| stats min(success) as min_success, min(fail) as min_fail, min(total) as min_total, 
    perc90(success) as perc90_success, perc90(fail) as perc90_fail, perc90(total) as perc90_total, 
    max(success) as max_success, max(fail) as max_fail, max(total) as max_total, 
    avg(success) as avg_success, avg(fail) as avg_fail, avg(total) as avg_total, 
    stdev(success) as stdev_success, stdev(fail) as stdev_fail, stdev(total) as stdev_total, 
    perc90(good) as perc90_good, perc90(bad) as perc90_bad,
    min(good) as min_good, min(bad) as min_bad,
    max(good) as max_good, max(bad) as max_bad,
    avg(good) as avg_good, avg(bad) as avg_bad,
    stdev(good) as max_good, stdev(bad) as stdev_bad,
    max(_time) as _time
    by agent, method, date_wday, date_hour, date_minute 
| table _time agent method min_* perc* max_* avg* stdev* date*
| outputcsv mytest.csv 

That gives you a csv file of your results for this particular moment in time, back one full week. Then you can use the csv of results to explore the data. That will give you a picture of your average week. From there, there are literally hundreds of ways to go, depending on what you find.

You are trying to understand the features of your data, and find out what is most important. How does the total number of transactions ebb and flow? Are there times of day that are different in character? Are there agents or methods that are more volatile than others? More predictable?

0 Karma

DalJeanis
Legend

Is the number of transactions that are good most important, or the percentage?

How does the 90th percentile relate to the avg and the stdev? Basically, this is the z score of the 90th percentile | eval z_x=round( (perc90_x - avg_x)/stdev_x , 2) That's usually going to be about 1.28, but the amount by which it is different might tell you something about the distribution you are looking at.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...