Splunk Search

Find the numerical outliers of the count for countries by day

parwindertaank
Explorer

I have the following which gives me the count of events that happen on separate days for each country.
Column1: Date, Column2: Country and Column3: Count

index="sample" sourcetype ="access_combined_wcookie" clientip=* | iplocation clientip | convert timeformat="%Y-%m-%d" ctime(_time) AS date | stats count by date, Country

For each Country, there are many days that occur that have events occurring on that day in total count.
I want to find the dates that have an abnormally high amount of events for each country and these would be the outliers.

How can I update my code to obtain this result?

0 Karma
1 Solution

elliotproebstel
Champion

The final solution will depend on how you want to define an outlier. One option would be to identify dates where the count of events was greater than 2 standard deviations above the average count for that country:

index="sample" sourcetype ="access_combined_wcookie" clientip=* 
| iplocation clientip 
| convert timeformat="%Y-%m-%d" ctime(_time) AS date 
| stats count by date, Country 
| eventstats avg(count) as avg_count stdev(count) as stdev_count BY Country 
| where count>(avg_count+(2*stdev_count))

View solution in original post

parwindertaank
Explorer

This is great thank you for your response.

Can this action also be performed using the Machine Learning Toolkit for Numerical Outliers?
Would the field to analyze be Count? And fields to split by Date and Country ?

Thanks in advance.

0 Karma

elliotproebstel
Champion

Honestly, I've never used the Machine Learning Toolkit, so I can't tell you. If you want to get some answers to that, I'd recommend a new post with that specific question - otherwise, this conversation will likely get buried.

0 Karma

elliotproebstel
Champion

The final solution will depend on how you want to define an outlier. One option would be to identify dates where the count of events was greater than 2 standard deviations above the average count for that country:

index="sample" sourcetype ="access_combined_wcookie" clientip=* 
| iplocation clientip 
| convert timeformat="%Y-%m-%d" ctime(_time) AS date 
| stats count by date, Country 
| eventstats avg(count) as avg_count stdev(count) as stdev_count BY Country 
| where count>(avg_count+(2*stdev_count))

chandana204
Communicator

Hi, I came across same kind of problem. I used this solution as reference. However, When I executed above search everything is working fine but stdev(count) is not working. It is only working when i remove BY country from the search. How can i find Standard deviation for every country?

Thanks,
Chandana

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...