Hi All,
I created an alert like below, which is working fine:
index=rxc sourcetype="rxc_app" response_status=* [| inputlookup a.csv | rename site as header | fields header] earliest=-15m@m latest=now | stats count AS Total count(eval(response_status like "5%")) AS Error_Count by endpoint| eval Error_perc=round((Error_Count/Total)*100)|fields endpoint Error_Count Error_perc| where (Error_perc>1AND Error_Count>25) | join endpoint [search index=rxc sourcetype="rxc_app" response_status=5* [| inputlookup a.csv | rename site as header | fields header] earliest=-15m@m latest=now | stats count(eval(response_status like "5%")) As Error, values(header) as name by endpoint| where Error>25]| table enpoint name Error Error_perc
But there is usually a spike in errors for about a minute or two, then it subsides. So what I want to create from this is a logic that can check for the percentage of errors every five minutes, and trigger an alert only after that threshold is breached for five consecutive minutes.
For example, within the first five minutes, the error percentage was 10 and the error count was 485. Then the next five-minute error percentage decreased, and so did the error count, so it should not trigger an alert, but if it were continuous then it should.
I think I am able to make query for what I wanted , just another question.
index=rxc sourcetype="rxc_app" response_status=* [| inputlookup a.csv | rename site as header | fields header] earliest=-15m@m latest=now | stats count AS Total count(eval(response_status like "5%")) AS Error_Count by endpoint| eval Error_perc=round((Error_Count/Total)100
I want to trend Error_perc with time to calculate threshold , how to do that like I want to see the %age error every 5min by endpoint