Splunk Search

Alert Behavior differs from Search that includes Where command

ksextonmacb
Path Finder

I'm running a search that does exactly what I want. The search is:

tag = authentication | transaction host user | where _time > (now() - 3600)

The results of this search will tell me if anyone has tried to log into a server for the first time ever (as in since the dawn of time) in the past hour, provided the search is done over all of time. If no one has logged in, no results are returned. You can change the window by changing the number 3600.

I want to turn this Search into an Alert. I've tried creating both real-time and hourly searches. I tried removing the "where" command for the real time search and letting the alert functionality take over. No matter what I do, alerts ignore the where command and produce different results than the search; I simply get each login attempt over whatever time the alert scans.

I really just want the alert to run my search and alert me if it has any results. Does anyone know how to do that?

Tags (3)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

In order to detect first-time login it is a very bad idea to search all data over and over again.

It's a better idea to remember when you've first (and maybe last while you're at it) seen any given user, and to compare each new chunk of data against that memory / state. That way you only look at each event once rather than infinite times.
This blog post describes the process nicely: http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/

View solution in original post

ksextonmacb
Path Finder

Here's what I finally did, for the next guy.

Run this search once:

tag=authentication _time!=null user!=null host!=null | stats min(_time) as first_time by user, host | outputlookup first_time_authentication_tracking.csv

Set up this search to run every so often as an alert:

tag=authentication _time!=null user!=null host!=null | stats first(_time) as first_time by user, host | inputlookup append=t first_time_authentication_tracking.csv | stats min(first_time) as first_time by user, host | outputlookup first_time_authentication_tracking.csv

I chose to run it every 15 minutes and scan the prior 20 minutes. To do this I created an alert using chron time scheduling as earliest -20m, latest +1m, scheduled for "asterisk/15 asterisk asterisk asterisk asterisk". Then I picked the alert condition to be custom, and the custom search to be something that would never happen, which in my case was "search first_time=null".

Finally, set up another alert using the search (the leading pipe is important):

| inputlookup first_time_authentication_tracking.csv | where first_time > now()-3600

This alert will do the actual alerting, and you just alert based on returned count being greater than zero. I scheduled it for every hour and it only alerts me if the file has a new entry time stamped for the last hour.

For debugging you should definitely grab the lookup editor app to look at the csv file. It's also useful for repopulating the first, time consuming search from a local csv file if you wipe out the csv file on the server.

My only concern is that, because these times are coded into the searches, some stuff might be left out. I'd prefer to figure out how to run the search time based on the maximum time in the file, but I'm going leave it alone for now and move on to something else.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can avoid overlaps by introducing a delay, e.g. earliest=-16m@m latest=-m@m if you know that all events will be indexed at most one minute after generation.

You can also avoid overlaps by going with index time rather than event time as the main criterion: earliest=-d latest=+d _index_earliest=-16m@m _index_latest=-m@m. That will even consider 20-hour-old events once, straight after they're indexed.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Running a search with latest=-15m every 15 minutes will inevitably lead to missed events.

Say an even occurs at 05:59:59.999 and is indexed at 06:00:00.001 - two milliseconds delay, awesome!
The search scheduled at 06:00:00 won't find it because it's not indexed yet. The search scheduled at 06:15:00 won't find it because it's out of its time range... oops!

Use techniques described in the blog I linked to clean up duplicates from your combined results before outputlookup'ing, then you can have overlapping time ranges in your scheduled search.

ksextonmacb
Path Finder

Fixed it to use stats a second time to remove duplicates correctly. Now I run it every 15 minutes and look at the prior 20. It'd still be nice to be able to schedule a short interval search that didn't overlap on time, but I haven't seen a way to do that and this one only takes a few seconds.

I had tried to use a second stats command before purely to mirror existing examples, but it was wiping my file's first_time column. However I found out the search I had posted was writing an additional line to the CSV file for any user logins it found, producing two lines for a user to the same host, so I had to figure out how to get it to merge the two times and only write the earliest.

The fix was to use the second stats on my created field to remove the later login, as opposed to on the _time field which didn't exist in my CSV file.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

In order to detect first-time login it is a very bad idea to search all data over and over again.

It's a better idea to remember when you've first (and maybe last while you're at it) seen any given user, and to compare each new chunk of data against that memory / state. That way you only look at each event once rather than infinite times.
This blog post describes the process nicely: http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/

ksextonmacb
Path Finder

I found this answer on splunk answers and it seems to be the gist of the blog so I'm linking it for posterity.

http://answers.splunk.com/answers/216701/how-to-send-an-alert-email-the-first-time-since-th.html#ans...

ccraig42
Engager

An alert specifies the timerange to search as part of the alert. This isn't really different than a search command with the time picker on the right (except the time picker is picked by the alert). You should be able to override that with "earliest" so

tag = authentication earliest=-20y | transaction host user | where _time > (now() - 3600)

(technically that only searches the last 20 years, but that's probably sufficient. "earliest" does allow absolute times, but I don't think you can specify the earliest event (which is what dispatch will use if you select "All Time" on the selector) from the command line).

But I wouldn't do that. There's a limit to the number of transactions Splunk will track and the number of events in a transaction, and it has to keep track of the entire transaction in memory, so if anybody has more than 1000 logins it's going to lose track of it. You're probably better off with

tag = authentication earliest=-20y | stats first(_time) as firsttime by host,user | where firsttime > (now()-3600)

0 Karma

lguinn2
Legend

Actually, earliest=0 specifies the earliest event. That said, I agree that you may not want to search "All Time" for this. If "All Time" is truly the requirement, I would probably do something like @martin_mueller proposes.

ccraig42
Engager

That was actually the first thing I tried (setting earliest=0). It gives the same set of events as the time selector and earliestTime in the job details is the midnight this morning when "Today" is selected, even though my test data goes back about a year.

0 Karma

woodcock
Esteemed Legend

Foget the transaction command; it is overcomplicating and performance will be the pits. Try this when run over All time:

tag = authentication | stats earliest(_time) AS firstTime latest(_time) AS latestTime by host user | where firstTime=latestTime AND lastestTime > (now() - 3600)
0 Karma

ksextonmacb
Path Finder

That search doesn't just find first time logins. It find all logins. I want an alert to find first time authentication events for each user-host pairing. The time constraint doesn't seem to work either; it's looking at all of my data.

The transaction command is in use because that is what limits things to only the first event with the pairing. I'm okay with poor performance on a command that runs every hour if it gets me what I'm after. It seems to take about 3 minutes to run the search against 6 years of data.

0 Karma

woodcock
Esteemed Legend

My search does exactly what you say you need. The only way to find the "first time ever" is to look at ALL events so that is what the search does. It says this: "for each user+host pairing, find the oldest occurrence and the newest occurrence and if these are the same, this 'first time ever' and of those, only show me the ones that happened in the last hour.

The transaction command does not in any sense do what you said it does, but the dedup command sort-of does. In any case, the best way to do what you say you need to do is what I showed you.

0 Karma

woodcock
Esteemed Legend

Do you have an update? Did you reconsider the answer's validity?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...