Gurus
I just started playing with splunk and after reading the alert howto it looks like a real-time/rolling window alert is a good start.
I tested a simple "Failed password" scenario where more than 2 failed logins per 60 seconds should trigger an alert.
This works as expected for all usernames. If I have the same user fail to logon 3 times within 60 seconds it sends an email.
However, it also sends an email when 3 different users fail to log on within this timeframe. I'm pretty sure "stats count per user" is the answer here but when I add that to "Failed password" in my search nothing triggers anymore. Not even when the same user fails 3 times within 60 seconds.
I believe there is a stats table that gets created as described here :
http://docs.splunk.com/Documentation/Splunk/latest/User/Alertusecases
This article makes it sound like I can just add this pipe to my query to make the alert aware of whether the same user fails or various users. Clearly I'm missing something.
I might have some misconception here but shouldn't I be able to view this table in the alert dashboard ? Where can I see the results in this table ?
The alert is "Failed password | stats count per user". As soon as I remove the pipe it starts working as before.
Any hint is appreciated.
Thx
Ok it has to do with the hostname field. Let's start with a basic search
"Failed password"
this gets 367 results such as
Aug 14 16:53:17 hostname sshd[31840]: Failed password for invalid user test_tuesday from {srcip} port 56847 ssh2
When I try to get a stats counter for the user by changing this to
"Failed password" | stats count by user
for the same timeframe I get the following table
user1 count 8
user2 count 1
That's it. When I drill down on the first user I get the message and it looks almost like the other one but it's from a few weeks ago.
Jul 24 17:31:11 hostname.domain.com sshd[1329]: Failed password for invalid user user1 from 10.91.25.76 port 54427 ssh2
Then I noticed is that this older message has the fqdn in it as a hostname and the newer ones don't.
I went through a short period where I was sending fqdns in the syslogs but changed that back soon after. Now it appears only messages that have the fqdn in there are extracting the user field. This would explain why the 1 minute or even 7 days timeframe doesn't yield any results.
I double checked and the same thing is true for the other user.
Isn't the hostname usually short in standard syslog ? Unless a '.' is used as delimiter I don't see how that would affect the extraction of the user field.
Am I onto something ?
Ayn
I had already removed the index and sourcetype and tried again.
As per my last post :
"I noticed that when I remove the "index=foo sourcetype=goo" part and test again the 3 events show up in the timeline. They still don't show in the results field or get emailed though."
Search : "Failed password" | stats count by user
Start time : rt-1m
End time : rt-0m
Condition : If condition is met Custom condition
search : search count > 2
Alert mode : once per search
This does show all 3 events in the linear scale but in the area where you usually see the actual raw message it still says "No results found". I"m pretty sure this is why no email is triggered since it would have no raw message to send right ?
It looks like piping to the stats count removes the actual raw message and converts it to just a counter.
Are you saying I should get an alert whenever I see an event show up in the linear scale ?
Is the "user" field being extracted properly ? Also check the fieldname case, field names are case sensitive(user, User, USER)
Damien
Thanks for your reply. I have the following now :
Search : index=foo sourcetype=goo "Failed password" | stats count by user
Start time : rt-1m
End time : rt-0m
Condition : If condition is met
Custom condition search : search count > 2
Alert mode : once per search
When I try to log on to a system 3 times in 60 seconds and fail the dashboard doesn't show any events now and nothing gets emailed.
I noticed that when I remove the "index=foo sourcetype=goo" part and test again the 3 events show up in the timeline. They still don't show in the results field or get emailed though.
That's right, thanks Ayn 🙂
The index=foo sourcetype=goo
were just examples of what you could put in your search. As in, "let's say you have logs with sourcetype 'foo' in your index 'goo'.". You have to modify the search terms to fit your situation. If putting just "failed password" worked just fine for you, just modify the search to just use that as a search term again.
Create a scheduled search like :
index=foo sourcetype=goo "Failed Password" | stats count by user
Select Condition -> "if custom condition is met".
And enter this as the Custom condition search :
search count > 2
I realized the example also pipes this to an actual table which I hadn't done so I tried this :
Failed password | stats count by user | table user
but still no alerts. Do I need to read the table manually ? Sorry if this is a stupid question...