Alerting

Problem resolution alert using splunk and send it via api

Amit_Sharma1
Observer

Hi Splunkers,

I am working on an alert which calculates the error rate (> 30%)and send the alerts to pagerduty via API

index="test1" source="mylogs"  NOT "TEST" earliest=-30m latest=now (":search-result match-indicator=\"PASS\" *******query*******
| stats count(id) AS OKStats
| appendcols [ search source=mylogs (":search-result match-indicator=\"ERROR\" *******query*******) NOT "TEST" earliest=-30m latest=now | stats count(id) AS ERRORStats]
| eval TotalTransactions = OKStats + ERRORStats
| eval ErrorRate = if(TotalTransactions > 0, round((DFAT_AP_ERR / TotalTransactions) * 100, 2), 0)
| where ErrorRate >= 30
| eval dedup_key="HighErrorRate"
| table ErrorRate, dedup_key

Now to clear the alert, I created another alert(<30%)

index="serverlogs" source="mylogs"  NOT "TEST" earliest=-30m latest=now (":search-result match-indicator=\"PASS\" ************myqueryparams************
| stats count(id) AS OKStats
| appendcols [ search source=mylogs (":search-result match-indicator=\"ERROR\" ************myqueryparams************) NOT "TEST" earliest=-30m latest=now | stats count(id) AS ERRORStats]
| bin _time span=5m
| eval ErrorPercent = if((OKStats + ERRORStats) > 0, round(ERRORStats / (OKStats + ERRORStats) * 100, 2), 0)
| sort -_time
| streamstats window=2 latest(ErrorPercent) as latest_percent, latest(_time) as latest_time, earliest(ErrorPercent) as earliest_percent
| where latest_time = _time AND latest_percent < 30 AND earliest_percent >= 30
| head 1
| eval dedup_key="HighErrorRate"
| table latest_percent, earliest_percent, dedup_key

I created two conditions and sending to pagerduty(number of rows >1),running every 30min and enabled throttle. I do not see the second alert working or clearing the alert. 

Any advice how to achieve the clearing of alerts which means the alert should be cleared on pagerduty. Currently creating python script is out of scope due to security reasons. Hence, was trying it via splunk query.

Regards,
Amit

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Amit_Sharma1 

In order to prevent the second search repeatedly firing when the threshold is not met, it needs some way of knowing about the previously triggered alert.

The way I would probably do this is to write to a lookup table as part of the first search which would set something that would indicate that the alert for high errors has fired, eg:

| eval isAlerting=1
| outputlookup myalert_status

Then in your second search check if the alert has previously fired, if it has then we would clear it:

index="serverlogs" source="mylogs"  NOT "TEST" earliest=-30m latest=now (":search-result match-indicator=\"PASS\" ************myqueryparams************
| stats count(id) AS OKStats
| appendcols [ search source=mylogs (":search-result match-indicator=\"ERROR\" ************myqueryparams************) NOT "TEST" earliest=-30m latest=now | stats count(id) AS ERRORStats]
| bin _time span=5m
| eval ErrorPercent = if((OKStats + ERRORStats) > 0, round(ERRORStats / (OKStats + ERRORStats) * 100, 2), 0)
| sort -_time
| streamstats window=2 latest(ErrorPercent) as latest_percent, latest(_time) as latest_time, earliest(ErrorPercent) as earliest_percent
| where latest_time = _time AND latest_percent < 30 AND earliest_percent >= 30
| head 1
| eval dedup_key="HighErrorRate"
| table latest_percent, earliest_percent, dedup_key
``` your search above ```
| appendcols [|inputlookup myalert_status]
| stats first(*) as *
| where isAlerting=1
``` Will not get past here if there is no previous alert fired ``` 
| eval isAlerting=0
| outputlookup myalert_status
``` Cleared the alert so it will not run multiple times ```

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

Amit_Sharma1
Observer

Thanks livehybrid.

I have applied the changes and so far not received any alert. I will wait for the threshold to be breached and see if it clears the alert based on the lookup entry.

So far i tested it manually by decreasing the threshold it created a lookup entry with alert status changing from 1 to 0.

Thanks,
Amit

0 Karma

PrewinThomas
Motivator

@Amit_Sharma1 

Since Splunk doesn’t remember past alerts, it just fires based on current query results. If no results are returned, no alert is triggered.

Can you try with single scheduled alert, send a “trigger” (error rate ≥ 30%) and a “resolve” (error rate < 30%) to PagerDuty with the same dedup_key.

 

| eval status=if(ErrorRate>=30,"triggered","resolved")
| eval dedup_key="HighErrorRate"
| table status, dedup_key, ErrorRate


Regards,
Prewin
If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

0 Karma

Amit_Sharma1
Observer

Thanks @PrewinThomas  for looking and replying.

With this solution it will keep on triggering the resolution alert after every 30 min window(based on my alert condition) as it is always below threshold.

My requirement is as below
I need an alert trigger as soon as error_rate crosses 30% and if it is below 30% clear the existing alert on pagerduty and this check should run every 30 min.

I do not need any alert(pagerduty notification) if it is below 30%(which means clear alert).

Thanks,
Amit

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...