I'm looking to create a large number of searches that will identify suspicious security events. An example of the logic would be as follows:
If the number of failed logins from src_ip in the first minute is > 3, set priority to 2. If the number of failed logins from src_ip in the first 3 minutes is > 100, set priority to 5. If the number of failed logins from src_ip in the first to minutes is > 1000, set priority to 9.
This kind of logic would capture a brute force login attack with increasing confidence.
What is the best way to implement something like this without having to schedule hundreds of searches (assuming you want to monitor a large variety of suspicious events)? I've been experimenting with a search like this:
host="192.168.198.1" earliest=-5m [search * earliest=-5m latest=-4m | stats count by src_ip,dest_ip,dest_port | rename count AS count1] | stats count by src_ip,dest_ip,dest_port | table src_ip,dest_ip,dest_port,count1,count
This search ran for a very, very long time. Far longer than the 5 minutes it was looking at. Would a better approach be to run saved searches that collect stats on the events in question and save them to either a lookup table or summary index for use by other searches that would apply the logic?
Is it possible to run a search with a latest= value that is in the future? I can see implementing something like this: A search runs every minute and checks the event id against a lookup table to see if that event triggers a more complex search. If so, it returns the search string. If there is a match, the event is piped to the search string returned from the lookup table with a latest= value x minutes into the future. That search would run until the latest= value is met and then use eval or stats to apply the logical tests. Ideally that final search could be sent to the background so the next event with a positive match in the lookup table can be processed.
Any help is appreciated.
earliest=-5m "failed login" | timechart count by src_ip | eval minsago=floor((now()-_time)/60) | lookup threshold_table minsago OUTPUT threshold priority | where count > threshold | stats max(priority) as priority by src_ip
And threshold table as:
minsago,threshold,priority 0,3,2 2,100,5 4,1000,9
seems like it would fit the textual description.
As far as time range, yes you can search for events in the future, but it sounds more like you really want to use real-time searches (or real-time alerts once 4.2 is out)
Update: You can just add in a
streamstats command to get the cumulative count:
earliest=-5m "failed login" | bucket _time span=1m | stats count by _time,src_ip | sort + _time | streamstats current=t window=0 global=f sum(count) as count by src_ip | eval minsago=floor((now()-_time)/60) | lookup threshold_table minsago OUTPUT threshold priority | where count > threshold | stats max(priority) as priority by src_ip
First off, I want to thank you and all the other folks who have answered my slews of questions. The Splunk community is incredibly supportive!
Does the "where count > threshold" command add up the contents of the preceding minutes? Let's say the failed login count looks like the following (minsago,src_ip,count):
If the count of events crosses the threshold at minsago=3, will that search show that at the end of minsago=4 the count of events was 1159?