Hello, I am trying to create an Alert on Splunk. I want to create an alert so that I am alerted every time a job fails 2 times or more within an hour. We have several different jobs running. Right now, I have a table displaying each job with the amount of failures of each.
index=?? uuid=* |search status=success | rex "message=(?<message>.*)" | stats count(eval(status=="failed")) AS Failures by workflow_name | table workflow_name, Failures
This displays something like :
workflow_name Failures
workflow_1 3
workflow_2 1
workflow_3 7
How can I fix this to filter and only include the workflows that have failed more than once (workflow_1 & workflow_3) and within a specific time frame - 1 hr. Additionally, I want to pull in info about the specific workflow with the latest failure (for ex: message, uuid, etc). For ex:
workflow_name Failures. Latest message Latest uuid
workflow_1 3 error msg 12345678
workflow_3 7 error msg 98765432
A where clause at the end of you query should do it; | where Failures > 1. Then you could schedule the job to run on whatever time frame you need.