Hi Guys ,
I want to check login behavior on a per-app basis. In short to look at when most logins happen, for example : if an application’s login behavior follows US business hours (~9am - ~6pm), but we see a login spike at 1am, that’s probably something strange. That’s the sort of things I’d like to find out.
Can anyone help me in this in writing SPL for this ?
Any suggestions will be helpful
Note: We are not using Splunk Enterprise security in our environment.
Hi @abhinav_bel,
if you want to include only working hours, you have to add to your search another condition:
e.g. working time from 8.00 to 17.00 from Monday to Friday, please try something like this
your_search (time_hour>7 time_hour<18) NOT (date_wday=sunday OR date_wday=saturday)
| ...
But in this way you don't manage holydays.
If you want to manage holydays, you have to create a lookup containing all the dates of an year and a code that identifies if is a working day or an holyday:
I used this approach and I created a macro (called "Non_Working_Days"), in this way I can modify working time in only one point and not in every search:
[Non_Working_Days]
definition = | lookup SIEMCAL.csv Day AS TimeStamp_Day OUTPUT Type \
| search Type=2 OR (Type=1 (time_hour>14 OR time_hour<8))
iseval = 0
Ciao.
Giuseppe
Collect your counts by the hour. Calculate the day of the week and hour ("%w%H"). Do this for your data and an average over the last month or two or whatever time period is appropriate. Then compare the count for the week and hour for the app to the average count for the week and hour for the app to see if there is a marked change. Alternatively, look at the machine learning options available with splunk.
thanks for giving alternate, machine learning in splunk seems better.
I know i need to do more than just an average + standard deviation (unless login volume really does follow a Gaussian distribution), so need to figure out a good method for baselining so for this probably need to do some basic five-number summary things to explore (min, max, median, first quartile, third quartile); keeping in mind the cyclic nature of the data (e.g., business hours + work days vs. off-hours and weekends).
So not sure how to approach in machine learning, which algorithm or method to start and also what search query to use to fit in algorithm for my use case.
Can u help in proceeding for this ?