Splunk Search

Tracking failed logins followed by successful logins using the transaction command, how can I make the search more efficient?

Motivator

I've been fooling around with the transaction command as I try and track failed logins followed by successful logins and right now I am running the following search:

index=  | transaction src_ip,user startswith="Login failed " endswith="Login succeeded" maxspan=15m maxpause=8h | stats avg(duration)

With that, I know there's always more efficient ways, or various twists one can put on a search and was wondering if anyone might provide some feedback as to a) what can be done to make the search more efficient, or b) put a twist on the search to pull other interesting information around failed logins followed by successful logins.

Thx

0 Karma

New Member

any idea how to Track failed logins which are NOT followed by successful logins using the transaction command?
this will give us the true count of failed logins.

thanks

0 Karma

Path Finder

Your search is good in that it captures failures followed by a successful login. But your time constraints are a bit odd. I've curated this very type of search sometime ago. I do see possibly a few design issues with your query.

The first is your windowing specification. If the maxspan time window is 15min, the maxpause clause wouldn't be used.

The second is any thresholding for failures before a success. Do you care about how many failures or will any do? This will need to take into consideration your systems' configuration: how many failures before lock out? Is it indefinite?

The third is, make sure you know specifically what you're logging. Pay attention to the authentication methods. If it cycles through the different methods, it'll show failure on them until successful with a different method.

I would consider a few things to augment your search here:

  • Use smaller time windows to make it easier for tracking logons, i.e., X failed logon attempts within Y minutes. You can also consider multiple searches, one against smaller time windows and another for a larger, more aggregate window.
  • Use eventstats to count the number of failures to set a threshold. Or for number of hosts attempted. Or duration. You have plenty of things to consider here, esp if you want to perform this real-time.
  • Consider filtering for/against specific authentication methods (if ssh).

I presume your logon events here are Linux or sshd based. May need to consider using calculated fields to capture the same logon success. The Splunk add-ons for Linux and Windows should have what you're looking for here as well.

Please let me know if any of my input rings well with you.

Explorer

I'm attempting to do something similar to what you've suggested with using eventstats to set a threshold on failed logons, but I don't understand enough about Splunk syntax to get it to work. The thing I "tried" to do, that obviously errored out is index= | transaction src_ip,user startswith=["Login failed" | eventstats count(src_ip) as count | where count > 10] endswith="Login succeeded" maxspan=30m. How and where would I insert eventstats to count the 10 failed logons as the transaction start?

Motivator

Thx for the reply and info.

Re: time window spec, I'm trying to look for both login failures followed by successful logins that happen quickly, and login failures followed by successful logins that have a larger gap in time between the events of failed login and successful login. I thought the maxspan/maxpause will allow me to play with different time periods.

Answering your first question, I do and don't care about how many failures as the query will be applied to both logins which have different authentication types, which accounts are locked out vs. accounts that do not get locked out.

How can I use eventstats in the query?

Thx again

0 Karma

Path Finder

Your usage of maxspan vs maxpause at the moment conflict. You have two drastically different use cases, which, in turn, will require very different resource usage to complete. Best always to iterate so it's easier to track where a query isn't efficient enough, esp at scale.

I specified authentication types, so that you may clear out failed auth messages that are normal, as the protocol will be to run through diff auth types until interactive or password, for example. Those extra are simply noise and would only negatively affect any statistics you try to run to capture heuristics for logging behavior wrt true login failure vs success.

eventstats work very similarly to stats, only consider it as an advanced method to capture count and other stats function results but without transforming events. Consider it like an in-line function to extract count metrics, for example. The key is how to aggregate by a certain field or fields.

It's a non-trivial, but highly effective function to use against your search data.

I would recommend reading the splunk docs + answers to get a better feel for when/how to use it. The most important thing I can suggest in using it is to filter for your search data appropriately and then basically pivot against the fields you want to isolate specifically.

There are other ways you can skin a cat, I've just found using eventstats is my method as part of your type of query here.