Hello fellow splunkers,
Hi @jogonz20
From a Splunk perspective the first stats is preparing the stage for the second stats to create additional statistics from it.
Not an expert in the matter, so bear with me.
For this type of things you often need more than an a count of logins per user so you have to find a way to enrich and/or clean your dataset in a way that it would help you detect the Outliers (a suspicious account behavior or brute force logging in this case), this is known as pre-processing your dataset before wrangling/modeling in ML.
The fist stats does not only counts the amount of times a user has logged in per day it also gives you how many days a user accessed which is later used in the second stats but perhaps it's easy to miss due to a rather complex mix of functions in there.
This is what's happening inside the second stats, just imagine you run this search for the last 30 days.
By doing this you synthesise extra features from the ones you have and end up with more dimensions that will help you find the outliers.
As a sidenote, I believe the null's are missing their (). On the other hand I wouldn't use null() in there, fill the null values with "0" or apply some filtering to remove them. But pay attention to results whichever decision you choose.
I hope it was helpful.
Hi @jogonz20
From a Splunk perspective the first stats is preparing the stage for the second stats to create additional statistics from it.
Not an expert in the matter, so bear with me.
For this type of things you often need more than an a count of logins per user so you have to find a way to enrich and/or clean your dataset in a way that it would help you detect the Outliers (a suspicious account behavior or brute force logging in this case), this is known as pre-processing your dataset before wrangling/modeling in ML.
The fist stats does not only counts the amount of times a user has logged in per day it also gives you how many days a user accessed which is later used in the second stats but perhaps it's easy to miss due to a rather complex mix of functions in there.
This is what's happening inside the second stats, just imagine you run this search for the last 30 days.
By doing this you synthesise extra features from the ones you have and end up with more dimensions that will help you find the outliers.
As a sidenote, I believe the null's are missing their (). On the other hand I wouldn't use null() in there, fill the null values with "0" or apply some filtering to remove them. But pay attention to results whichever decision you choose.
I hope it was helpful.