Alerting

Send an alert when the Fill ratio of data processing queues exceeds a certain percentage

jstacey_intuit
Explorer

I am using the Splunk SoS App, and am interested in setting up some alerts around the "Fill ratio of data processing queues" metrics. I'd like receive an alert when "X" queue is more than 75% for more than 10 minutes.

Tags (4)
0 Karma

hexx
Splunk Employee
Splunk Employee

Here is the search that I like to use as an alert (condition: event count > 0) to detect queue saturation for the parsing or indexing queues:

index=_internal source=*metrics.log group=queue (name=indexqueue OR name=parsingqueue) earliest=-1h 
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) 
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) 
| eval fill_perc=ceiling((curr/max)*100) 
| timechart span=30s first(fill_perc) by name 
| streamstats count(eval(parsingqueue>90)) AS parsingq_saturation_count count(eval(indexqueue>90)) AS indexq_saturation_count window=10 
| where indexq_saturation_count>19 OR parsingq_saturation_count>19
| bin _time span=10m 
| stats median(parsingqueue) AS parsingqueue median(indexqueue) AS indexqueue by _time

The logic is that this search will produce one event for each occurrence in the last hour where the parsing queue or the indexing queue were found to be at least 90% full during 20 consecutive 30-second samples - a 10-minute time window. When that happens, the search will return the median fill percentages for both queues over the time windows where saturation was detected.

yannK
Splunk Employee
Splunk Employee

Yes, and what is the problem ?
But you will have a lot of false positives, because you always can have peaks at > 90%, that are temporary.

index=_internal source=*metrics.log* group=queue | bucket _time span=10m| stats avg(current_size_kb) AS avgsize_kb max(max_size_kb) AS maxsize_kb by _time name | eval percentage=round(avgsize_kb/maxsize_kb,2) | where percentage > 90

or course narrow to your timerange and to the queues you want to see.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...