How to alert if all the queues for a respective in...

Navanitha · ‎12-26-2022

I need to create an alert when all the below queues are at 100% for respective indexer. For this I am using "DMC Alert - Saturated Event-Processing Queues" inbuilt alert but need to tweak it a little bit to alert when all the 4 queues " aggQueue.*" "indexQueue.0*" "parsingQueue.*" and "typingQueue.0" are at 100% for that host.

Query -

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search title=tcpin_queue* OR title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2)
| fields title fifteen_min_fill_perc splunk_server
| where fifteen_min_fill_perc > 99
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

Output -

Queue name Average queue fill percentage (last 15min) Instance

aggQueue.0	99.98	x
aggQueue.1	100.00	x
aggQueue.2	99.99	x
indexQueue.0	100.00	x
indexQueue.1	99.98	x
indexQueue.2	99.97	x
parsingQueue.0	100.00	x
parsingQueue.1	99.82	x
parsingQueue.2	99.98	x
typingQueue.0	99.96	x
typingQueue.1	99.99	x
typingQueue.2	99.96	x
aggQueue.0	100.00	y
aggQueue.1	100.00	y
aggQueue.2	100.00	y
indexQueue.0	100.00	y
indexQueue.1	100.00	y
indexQueue.2	100.00	y
parsingQueue.0	100.00	y
parsingQueue.1	100.00	y

gcusello · ‎12-26-2022

Hi @Navanitha,

i use this search:

index=_internal  source=*metrics.log sourcetype=splunkd group=queue 
| eval name=case(name=="aggqueue","2 - Aggregation Queue",
 name=="indexqueue", "4 - Indexing Queue",
 name=="parsingqueue", "1 - Parsing Queue",
 name=="typingqueue", "3 - Typing Queue",
 name=="splunktcpin", "0 - TCP In Queue",
 name=="tcpin_cooked_pqueue", "0 - TCP In Queue") 
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) 
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) 
| eval fill_perc=round((curr/max)*100,2) 
| bin _time span=1m
| stats Median(fill_perc) AS "fill_percentage" max(max) AS max max(curr) AS curr by host, _time, name 
| where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue")
| sort -_time

Ciao.

Giuseppe

tscroggins · ‎12-26-2022

@Navanitha

Removing tcpin_queue* and counting the number of distinct base queue names by Splunk instance should allow you to alert when all 4 queues across any number of pipelines have breached your threshold:

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search ```title=tcpin_queue* OR``` title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2) 
| fields title fifteen_min_fill_perc splunk_server 
| where fifteen_min_fill_perc > 99
| rex field=title "(?<basename>[^.]+)" 
| eventstats dc(basename) as distinct_count by splunk_server
| where distinct_count==4
| fields - basename distinct_count
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

I've added the rex, eventstats, where, and fields commands on lines 6-9 to your original search.

In my own environments, I also keep an eye on blocked queues:

|  tstats latest(PREFIX(max_size_kb=)) as max_size_kb latest(PREFIX(largest_size=)) as largest_size where index=_internal source=*metrics.log* TERM(group=queue) TERM(blocked=true) by host PREFIX(name=)

Navanitha · ‎12-27-2022

@tscroggins Thank you for looking into my query. I tried the search query you posted and the results are same as my search query. What I am looking for a consolidated report for example, in the output I pasted in my original post, instance "Y" has all the four queues full (parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue) so my output should only be this instance name. I will set up and alert for this host for further action. Any suggestions pls ?

tscroggins · ‎12-27-2022

@Navanitha

In the table in your original post, only instance X would pass the new where clause. If you want to reduce the results to just an instance name, you can add stats, dedup, etc. to your search:

| stats count by splunk_server
| fields - count

These would replace the rename command.

Navanitha · ‎01-18-2023

Query seems to be working but partially. When I run the query I get results for splunk_server whose one of the parsing queue pipeline is not greater than the threshold I set (which is >80). As per my requirement this server xyz should not showup as its parsing_queue.0 is not greater than thershold. (It should only report if all its 3 pipelines 4 Queues are greater than 80).

title fifteen_min_fill_perc splunk_server

aggQueue.0	87.79	xyz
aggQueue.1	87.66	xyz
aggQueue.2	86.22	xyz
indexQueue.0	88.43	xyz
indexQueue.1	87.96	xyz
indexQueue.2	89.16	xyz
parsingQueue.0	65.10	xyz
parsingQueue.1	86.32	xyz
typingQueue.0	88.28	xyz
typingQueue.1	87.87	xyz
typingQueue.2	89.13	xyz

Appreciate if you could also help me understand more on why dc is used here and how does it work?

How to alert if all the queues for a respective indexer gets full?

search job inspector

subsearch

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?