All Apps and Add-ons

Calculating outliers total for each user

jorjiana88
Path Finder

I have a test dataset containing users, time and websites that they access.
I used the Splunk Machine learning toolkit with Detecting Outliers Assistant to get the increase in number of visits to Job searching websites.

index="http" sourcetype="http" AND "monster.com" OR "careerbuilder.com" OR "job-hunt.org" OR "aol.com/jobs" OR "simplyhired.com" OR "yahoo.com/hotjobs"
| where user=ABB0427
| sort _time | bucket _time span=1d

| stats count by _time, user
| eventstats median("count") as median
| eval absDev=(abs('count'-median))
| eventstats median(absDev) as medianAbsDev
| eval lowerBound=(median-medianAbsDev*exact(2)), upperBound=(median+medianAbsDev*exact(1))
| eval isOutlier=if('count' < lowerBound OR 'count' > upperBound, 1, 0)
| fields _time, "count", lowerBound, upperBound, isOutlier, *

This results for this user is 10 outliers.

Now I am trying to make a similar search that would provide the number of outliers for each of the users. If I try to just remove the filtering for this user and leave the data with all 1000 users, the result of outliers for this user is not anymore 10, but 4. Looks like the lowerbound, upperbound are different when removing the user filtering, and looks like all the users are being calculated using the same lowerbound, upperbound. I expected that the calculation is done differently for each user. Attached some pictures.
alt textalt text

Any suggestion how to calculate the outliers for each user ?

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

Hi @jorjiana88
You need to add the user to the split bys...

| eventstats median("count") as median by user
| eval absDev=(abs('count'-median))
| eventstats median(absDev) as medianAbsDev by user

Additionally, you may want to add

| makecontinuous _time

after your bucket command to fill in any empty time gaps.

I think that's all you're missing.

View solution in original post

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Hi @jorjiana88
You need to add the user to the split bys...

| eventstats median("count") as median by user
| eval absDev=(abs('count'-median))
| eventstats median(absDev) as medianAbsDev by user

Additionally, you may want to add

| makecontinuous _time

after your bucket command to fill in any empty time gaps.

I think that's all you're missing.

0 Karma

jorjiana88
Path Finder

Thans a lot ! If I split by users the calculation is different for each user as expected, but I still have another problem. Not sure, maybe I should ask new question for this 🙂

The result is still not the same, I don't get 10 outliers for that user. I think the problem is more up in the query.

This query (where I do counts for all users, and filtering is done only at the end) , shows only the counts for the first 9 days:

index="http" sourcetype="http" AND "monster.com" OR "careerbuilder.com" OR "job-hunt.org" OR "aol.com/jobs" OR "simplyhired.com" OR "yahoo.com/hotjobs"

| sort _time | bucket _time span=1d

| stats count as counts by _time, user
| search user=ABB0427

This one where I filter by this user from the beginning shows counts for 25 days:

index="http" sourcetype="http" AND "monster.com" OR "careerbuilder.com" OR "job-hunt.org" OR "aol.com/jobs" OR "simplyhired.com" OR "yahoo.com/hotjobs" AND abb0427
| sort _time | bucket _time span=1d

| stats count as counts by _time, user

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

The difference is that the first one is filtering on the value of the field - whereas the second one (25 days) is searching for the string occurrences of the abb0427 in the _raw field.

0 Karma

jorjiana88
Path Finder

Because of the data, the result should be the same even if only searching for the string.
Actually after removing the | sort _time , both queries result in the same, so the issue is solved. Thank you very much for the super fast response.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...