Hey everyone! I’m currently working to implement ML detection in my authentication logs. I already created an algorithm to find and report outliers in all of my failed authentication logs (using the Detect Numerical Outliers method in the ML Toolkit). However, I am now trying to create an algorithm that detects outliers separately for each individual IP address by analyzing the trends of activity and creating thresholds that are specific to each IP.
I know there is a split-by-field option, but there is no way to incorporate time into analyzing each IP address separately. Is there anyone that has any input or ideas to solve this? Any help would be greatly appreciated!
I'm not sure what you mean by a couple of items in your question.
When you say "outliers in your failed authentication logs", on what metric or scale are these events outliers?
Second, when you say "outliers by IP address", what is the underlying basis for believing that IP addresses should have different characteristics on the metric or scale you are looking at?
Third, what does _time have to do with it?
I'm assuming that the word "trend" is where you're getting the idea you need a time factor. That's fine, as far as it goes, but your use of ML in this process is really seeming like someone who really wants to use a particular tool no matter what he is using it on. (There's an old saying, "To a man with a hammer, everything looks like a nail.")
So, please back up and give us some information about what your use case is, and what kind of outlier you are detecting with this process. I'm thinking that if you want trending and such, then despite the fact that ML is awesome and trendy, it's the wrong tool for the job.
Hey DalJeanis! Thank you so much for the reply!
My use case of the ML Toolkit is to predict a threshold range for every day/hour in the week for the amount of users logging in (or failing to login) to each portal in my product. I have login logs in Splunk that captures successful/failed login attempts, with information about the portal the user intends to visit after authentication. There are different numbers of login events from users with the intent to visit portal X as opposed to portal Y. Therefore, my use case is to measure past login events for each of the several hundred thousand distinct portals in order to predict the appropriate number of login events for each portal on an hourly or daily basis, and the outliers are the situations where there are more (or suspiciously low) logins to portal X than the model would predict.
Now to clarify my question for you, first by “outliers in my failed authentication logs”, I mean high counts of failed authentication (shown by my logs) in a period of time. Using the Detect Numeric Outliers Showcase example in the ML Toolkit, I was able to create an algorithm to detect anomalous amount of failed logins in a 1 hour time span throughout all my product. Now, I want to dig deeper and analyze each portal separately.
Regarding your second question, I initially wanted to split this analysis by IP addresses (to see the trends of one IP address compared to others). Since the last time I posted, I realized splitting by Portal ID is more efficient in accomplishing my goal. Given any week, there is a pattern of login attempts that can be seen throughout all customer portals. Instead of hardcoding a threshold number, I ideally want a range of failed login attempts for each portal given any hour of the day AND alert on when there is a high (or suspiciously low) amount of logins coming from a portal.
Lastly, for your third question, time is everything in this practice. I want to identify what is “common” in all time periods of the day/week. Using ML and the six month data in my Splunk, I want to identify the usual number of login events for each portal on an hourly or daily basis.
Do you have any suggestions for how to use ML for this use case? Or do you think I am still trying to hammer something that is not a nail? Again, I appreciate your help and look forward to any input!