Goal:
If "[FATAL]" FTP message to same destination host "host-xyz" is found 3 times within 1 minute, then trigger alert to send email to admin.
Alert results:
Should be grouped by time showing which hostnames failed within a 1 minute period.
Time host
TIMESTAMP host-xyz
TIMESTAMP host-albert
host-jimbob
TIMESTAMP host-abc
My problem:
1) I am getting most of what i need from my query but I don't know how to organize the results to display as i describe above.
2) I don't think I am counting properly of 3 events within 1 min, as my current alert results show below.
Sample Log data events:
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz 1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz 1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:17:56 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz 1/1 attempt
...etc...
Field extract created:
I Created field called: 'failed_host' to be the hostname name found on an event (ex host-xyz)
Current query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
Current alert results:
occurred failed_host _time count
1 FTP failed abc837 2018-03-12 08:03:00 2
2 FTP failed abc837 2018-03-12 08:04:00 2
3 FTP failed abc840 2018-03-19 17:17:00 2
4 FTP failed abc840 2018-03-19 17:18:00 2
5 FTP failed abc841 2018-03-19 17:17:00 2
6 FTP failed abc841 2018-03-19 17:18:00 2
7 FTP failed abc842 2018-03-12 08:03:00 2
8 FTP failed abc842 2018-03-12 08:04:00 2
9 FTP failed abc844 2018-03-12 08:03:00 4
I would recommend you to use the transaction
command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount
And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount
field that counts the number of events in the transaction. You may also find the field duration
interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.
I hope it helps you!
Omer
edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time
I would recommend you to use the transaction
command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount
And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount
field that counts the number of events in the transaction. You may also find the field duration
interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.
I hope it helps you!
Omer
edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time
Omeri,
You nailed it! My customer is very happy and so am I.
Your response time and suggestion was easy to implement and dead on. And the extra edit add at the bottom even made it better. Report looks sweet.
I really appreciate this help.
Thanks much,
Damon