Splunk Search

Using Machine Learning Toolkit to detect auth abuse

patpro
Path Finder

Hello,

I’m trying to tune Machine Learning Toolkit in order to detect authentication abuse on a web portal (based upon Lemon LDAP-NG).

My logs look like this:

(time/host/... header) client=(IP address) user=(login) sessionID=(session-id) mail=(user email address) action=(various statuses: connected / non-existent user / wrong pwd…)
 
I would like to train the Machine Learning Toolkit so that I can detect anomalies. Those anomalies can be:
- that client has made auth attempts for an unusual number of logins
- that client has made auth attempts for both non-existing and existing users
- …
 
So far it fails hard.
 
I’ve trained a model like this on approx. a month of data:

 

index="webauth" ( TERM(was) TERM(not) TERM(found) TERM(in) TERM(LDAP) ) OR TERM(connected) OR TERM(credentials) linecount=1 | rex "action=(?<act>.*)" | eval action=case(match(act,".* connected"), "connected", match(act,".* was not found in LDAP directory.*"), "unknown", match(act, ".* credentials"),"wrongpassword") | bin span=1h _time | eventstats dc(user) AS dcUsers, count(user) AS countUsers BY client,_time,action|search dcUsers>1|stats values(dcUsers) AS DCU,values(countUsers) AS CU BY client,_time,action| eval HourOfDay=strftime(_time,"%H") 
| fit DensityFunction CU by "client,DCU" as outlier into app:TEST

 

 
Then I’ve tested the model on another time interval where I know there is a big anomaly, by replacing the fit directive by "apply (model-name) threshold=(various values)".
No result.
 
So I guess I’m not on the right track to achieve this. Any help appreciated!
 
Tags (1)
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...