Hello everyone,
I want to create an alert based on nginx logs using the Machine Learning Toolkit.
Basically, I would like to train a model to detect when a client (with an ID already in the logs) logs in, or attempts to log in, from an unusual location. Example: we have a client that usually connects from US East, but suddenly has a connection from Russia.
I have tried a few searches using iplocation, using DensityFunction and OneClassSVM, but haven't been able to create a model that correctly detects anomalies.
If anyone has any insight, or has done something like that before, I would appreciate the help.
@tag-osrour - I have not use MLTK to implement this, but I've used regular Splunk lookup to implement what you need.
You need two scheduled searches:
| tstats count, values(Authentication.org_country) as org_country from datamodel=Authentication where AND Authentication.user!="unknown" by Authentication.app, Authentication.user, Authentication.src, _time span=1d
| `drop_dm_object_name(Authentication)`
| iplocation src
| eval Country = if(isnotnull(org_country), org_country, Country)
| inputlookup authentication_usual_location.csv append=true
| where _time > relative_time(now(), "-12w@w")
| dedup user, app, Country, _time
| outputlookup authentication_usual_location.csv| tstats count from datamodel=Authentication by Authentication.app, Authentication.action, Authentication.user, Authentication.src, Authentication.dest, _time
| `drop_dm_object_name(Authentication)`
| eval user = lower(user)
| iplocation src
| inputlookup authentication_usual_location.csv append=true
| fillnull value=0 percentage_login_from_country
| where percentage_login_from_country < 15
| eval reason = case(isnull(usual_login_location), "No login from this user", percentage_login_from_country=="0", "No login from this country", true(), "Low historical login from this country")
| table _time user dest src app count City Region Country percentage_login_from_country reason usual_login_location
These are not full queries, but they give idea on how you can implement it in your environment with your data with Splunk lookups.
I hope this helps!!! Kindly upvote if it does.