I need to create an alert when all the below queues are at 100% for respective indexer. For this I am using "DMC Alert - Saturated Event-Processing Queues" inbuilt alert but need to tweak it a little bit to alert when all the 4 queues " aggQueue.*" "indexQueue.0*" "parsingQueue.*" and "typingQueue.0" are at 100% for that host.
Query -
| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search title=tcpin_queue* OR title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2)
| fields title fifteen_min_fill_perc splunk_server
| where fifteen_min_fill_perc > 99
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"
Output -
Queue name Average queue fill percentage (last 15min) Instance
aggQueue.0 | 99.98 | x |
aggQueue.1 | 100.00 | x |
aggQueue.2 | 99.99 | x |
indexQueue.0 | 100.00 | x |
indexQueue.1 | 99.98 | x |
indexQueue.2 | 99.97 | x |
parsingQueue.0 | 100.00 | x |
parsingQueue.1 | 99.82 | x |
parsingQueue.2 | 99.98 | x |
typingQueue.0 | 99.96 | x |
typingQueue.1 | 99.99 | x |
typingQueue.2 | 99.96 | x |
aggQueue.0 | 100.00 | y |
aggQueue.1 | 100.00 | y |
aggQueue.2 | 100.00 | y |
indexQueue.0 | 100.00 | y |
indexQueue.1 | 100.00 | y |
indexQueue.2 | 100.00 | y |
parsingQueue.0 | 100.00 | y |
parsingQueue.1 | 100.00 | y |
Hi @Navanitha,
i use this search:
index=_internal source=*metrics.log sourcetype=splunkd group=queue
| eval name=case(name=="aggqueue","2 - Aggregation Queue",
name=="indexqueue", "4 - Indexing Queue",
name=="parsingqueue", "1 - Parsing Queue",
name=="typingqueue", "3 - Typing Queue",
name=="splunktcpin", "0 - TCP In Queue",
name=="tcpin_cooked_pqueue", "0 - TCP In Queue")
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size)
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size)
| eval fill_perc=round((curr/max)*100,2)
| bin _time span=1m
| stats Median(fill_perc) AS "fill_percentage" max(max) AS max max(curr) AS curr by host, _time, name
| where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue")
| sort -_time
Ciao.
Giuseppe
Removing tcpin_queue* and counting the number of distinct base queue names by Splunk instance should allow you to alert when all 4 queues across any number of pipelines have breached your threshold:
| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search ```title=tcpin_queue* OR``` title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2)
| fields title fifteen_min_fill_perc splunk_server
| where fifteen_min_fill_perc > 99
| rex field=title "(?<basename>[^.]+)"
| eventstats dc(basename) as distinct_count by splunk_server
| where distinct_count==4
| fields - basename distinct_count
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"
I've added the rex, eventstats, where, and fields commands on lines 6-9 to your original search.
In my own environments, I also keep an eye on blocked queues:
| tstats latest(PREFIX(max_size_kb=)) as max_size_kb latest(PREFIX(largest_size=)) as largest_size where index=_internal source=*metrics.log* TERM(group=queue) TERM(blocked=true) by host PREFIX(name=)
@tscroggins Thank you for looking into my query. I tried the search query you posted and the results are same as my search query. What I am looking for a consolidated report for example, in the output I pasted in my original post, instance "Y" has all the four queues full (parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue) so my output should only be this instance name. I will set up and alert for this host for further action. Any suggestions pls ?
In the table in your original post, only instance X would pass the new where clause. If you want to reduce the results to just an instance name, you can add stats, dedup, etc. to your search:
| stats count by splunk_server
| fields - count
These would replace the rename command.
Query seems to be working but partially. When I run the query I get results for splunk_server whose one of the parsing queue pipeline is not greater than the threshold I set (which is >80). As per my requirement this server xyz should not showup as its parsing_queue.0 is not greater than thershold. (It should only report if all its 3 pipelines 4 Queues are greater than 80).
title fifteen_min_fill_perc splunk_server
aggQueue.0 | 87.79 | xyz |
aggQueue.1 | 87.66 | xyz |
aggQueue.2 | 86.22 | xyz |
indexQueue.0 | 88.43 | xyz |
indexQueue.1 | 87.96 | xyz |
indexQueue.2 | 89.16 | xyz |
parsingQueue.0 | 65.10 | xyz |
parsingQueue.1 | 86.32 | xyz |
typingQueue.0 | 88.28 | xyz |
typingQueue.1 | 87.87 | xyz |
typingQueue.2 | 89.13 | xyz |
Appreciate if you could also help me understand more on why dc is used here and how does it work?