Splunk Search

How to alert if all the queues for a respective indexer gets full?

Navanitha
Path Finder

I need to create an alert when all the below queues are at 100% for respective indexer.  For this I am using "DMC Alert - Saturated Event-Processing Queues" inbuilt alert but need to tweak it a little bit to alert when all the 4 queues " aggQueue.*"  "indexQueue.0*"  "parsingQueue.*" and "typingQueue.0" are at 100% for that host.

Query - 

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search title=tcpin_queue* OR title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2)
| fields title fifteen_min_fill_perc splunk_server
| where fifteen_min_fill_perc > 99
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

 

Output -

Queue name Average queue fill percentage (last 15min) Instance

aggQueue.0 99.98 x
aggQueue.1 100.00 x
aggQueue.2 99.99 x
indexQueue.0 100.00 x
indexQueue.1 99.98 x
indexQueue.2 99.97 x
parsingQueue.0 100.00 x
parsingQueue.1 99.82 x
parsingQueue.2 99.98 x
typingQueue.0 99.96 x
typingQueue.1 99.99 x
typingQueue.2 99.96 x
aggQueue.0 100.00 y
aggQueue.1 100.00 y
aggQueue.2 100.00 y
indexQueue.0 100.00 y
indexQueue.1 100.00 y
indexQueue.2 100.00 y
parsingQueue.0 100.00 y
parsingQueue.1 100.00 y

 

Labels (2)
Tags (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Navanitha,

i use this search:

index=_internal  source=*metrics.log sourcetype=splunkd group=queue 
| eval name=case(name=="aggqueue","2 - Aggregation Queue",
 name=="indexqueue", "4 - Indexing Queue",
 name=="parsingqueue", "1 - Parsing Queue",
 name=="typingqueue", "3 - Typing Queue",
 name=="splunktcpin", "0 - TCP In Queue",
 name=="tcpin_cooked_pqueue", "0 - TCP In Queue") 
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) 
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) 
| eval fill_perc=round((curr/max)*100,2) 
| bin _time span=1m
| stats Median(fill_perc) AS "fill_percentage" max(max) AS max max(curr) AS curr by host, _time, name 
| where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue")
| sort -_time

Ciao.

Giuseppe

0 Karma

tscroggins
Influencer

@Navanitha 

Removing tcpin_queue* and counting the number of distinct base queue names by Splunk instance should allow you to alert when all 4 queues across any number of pipelines have breached your threshold:

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search ```title=tcpin_queue* OR``` title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2) 
| fields title fifteen_min_fill_perc splunk_server 
| where fifteen_min_fill_perc > 99
| rex field=title "(?<basename>[^.]+)" 
| eventstats dc(basename) as distinct_count by splunk_server
| where distinct_count==4
| fields - basename distinct_count
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

I've added the rex, eventstats, where, and fields commands on lines 6-9 to your original search.

In my own environments, I also keep an eye on blocked queues:

|  tstats latest(PREFIX(max_size_kb=)) as max_size_kb latest(PREFIX(largest_size=)) as largest_size where index=_internal source=*metrics.log* TERM(group=queue) TERM(blocked=true) by host PREFIX(name=)
0 Karma

Navanitha
Path Finder

@tscroggins  Thank you for looking into my query.  I tried the search query you posted and the results are same as my search query.  What I am looking for a consolidated report for example, in the output I pasted in my original post, instance "Y" has all the four queues full (parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue) so my output should only be this instance name.  I will set up and alert for this host for further action.  Any suggestions pls ?

 

0 Karma

tscroggins
Influencer

@Navanitha 

In the table in your original post, only instance X would pass the new where clause. If you want to reduce the results to just an instance name, you can add stats, dedup, etc. to your search:

| stats count by splunk_server
| fields - count

These would replace the rename command.

0 Karma

Navanitha
Path Finder

Query seems to be working but partially.  When I run the query I get results for splunk_server whose one of the  parsing queue pipeline is not greater than the threshold I set (which is >80). As per my requirement this server xyz should not showup as its parsing_queue.0 is not greater than thershold. (It should only report if all its 3 pipelines 4 Queues are greater than 80).

title fifteen_min_fill_perc splunk_server

aggQueue.087.79xyz
aggQueue.187.66xyz
aggQueue.286.22xyz
indexQueue.088.43xyz
indexQueue.187.96xyz
indexQueue.289.16xyz
parsingQueue.065.10xyz
parsingQueue.186.32xyz
typingQueue.088.28xyz
typingQueue.187.87xyz
typingQueue.289.13xyz

Appreciate if you could also help me understand more on why dc is used here and how does it work?  

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...