Using EMR Spark & all the logs goes to splunk & there are multiple type of jobs running in the cluster. I want to setup splunk alert,if more that 5% total no. of jobs failed then we get the alert.
Hi,
Thanks for you reply.
using following query i am able to get all the failed jobs.
index=emr | search applicationType=SPARK finalStatus=FAILED
In our environment more than 300 jobs are running per day.(batch jobs & streaming jobs).
#1:-I want to setup an alert, if fail jobs count reach to 5% then it trigger the alert.
#2:-Numbers of job can be fluctuate. some day total count would be more than 300 or less than 300. So percent (5%) should be on actual count. like total count of the day is 280. then what would be the parameter.
Please give me the query what should i need to run.
Thanks in Advanced
Hi,
Thanks for you reply.
using following query i am able to get all the failed jobs.
index=emr | search applicationType=SPARK finalStatus=FAILED
In our environment more than 300 jobs are running per day.(batch jobs & streaming jobs).
#1:-I want to setup an alert, if fail jobs count reach to 5% then it trigger the alert.
#2:-Numbers of job can be fluctuate. some day total count would be more than 300 or less than 300. So percent (5%) should be on actual count. like total count of the day is 280. then what would be the parameter.
Please give me the query what should i need to run.
Thanks in Advanced.
In general, since the question does not contain specifics about the data, you'll need a count of all jobs as well as a count of jobs that failed. Use math to find the failure percentage.
your search | eval fail = if(test for failure, 1, 0)
| stats count as total, sum(fail) as failures
| eval pct = failures * 100 / total
| where pct > 5