Splunk Search

How to alert if all the queues for a respective indexer gets full?

Navanitha
Path Finder

I need to create an alert when all the below queues are at 100% for respective indexer.  For this I am using "DMC Alert - Saturated Event-Processing Queues" inbuilt alert but need to tweak it a little bit to alert when all the 4 queues " aggQueue.*"  "indexQueue.0*"  "parsingQueue.*" and "typingQueue.0" are at 100% for that host.

Query - 

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search title=tcpin_queue* OR title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2)
| fields title fifteen_min_fill_perc splunk_server
| where fifteen_min_fill_perc > 99
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

 

Output -

Queue name Average queue fill percentage (last 15min) Instance

aggQueue.0 99.98 x
aggQueue.1 100.00 x
aggQueue.2 99.99 x
indexQueue.0 100.00 x
indexQueue.1 99.98 x
indexQueue.2 99.97 x
parsingQueue.0 100.00 x
parsingQueue.1 99.82 x
parsingQueue.2 99.98 x
typingQueue.0 99.96 x
typingQueue.1 99.99 x
typingQueue.2 99.96 x
aggQueue.0 100.00 y
aggQueue.1 100.00 y
aggQueue.2 100.00 y
indexQueue.0 100.00 y
indexQueue.1 100.00 y
indexQueue.2 100.00 y
parsingQueue.0 100.00 y
parsingQueue.1 100.00 y

 

Labels (2)
Tags (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Navanitha,

i use this search:

index=_internal  source=*metrics.log sourcetype=splunkd group=queue 
| eval name=case(name=="aggqueue","2 - Aggregation Queue",
 name=="indexqueue", "4 - Indexing Queue",
 name=="parsingqueue", "1 - Parsing Queue",
 name=="typingqueue", "3 - Typing Queue",
 name=="splunktcpin", "0 - TCP In Queue",
 name=="tcpin_cooked_pqueue", "0 - TCP In Queue") 
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) 
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) 
| eval fill_perc=round((curr/max)*100,2) 
| bin _time span=1m
| stats Median(fill_perc) AS "fill_percentage" max(max) AS max max(curr) AS curr by host, _time, name 
| where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue")
| sort -_time

Ciao.

Giuseppe

0 Karma

tscroggins
Influencer

@Navanitha 

Removing tcpin_queue* and counting the number of distinct base queue names by Splunk instance should allow you to alert when all 4 queues across any number of pipelines have breached your threshold:

| rest splunk_server_group=dmc_group_indexer /services/server/introspection/queues
| search ```title=tcpin_queue* OR``` title=parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue*
| eval fifteen_min_fill_perc = round(value_cntr3_size_bytes_lookback / max_size_bytes * 100,2) 
| fields title fifteen_min_fill_perc splunk_server 
| where fifteen_min_fill_perc > 99
| rex field=title "(?<basename>[^.]+)" 
| eventstats dc(basename) as distinct_count by splunk_server
| where distinct_count==4
| fields - basename distinct_count
| rename splunk_server as Instance, title AS "Queue name", fifteen_min_fill_perc AS "Average queue fill percentage (last 15min)"

I've added the rex, eventstats, where, and fields commands on lines 6-9 to your original search.

In my own environments, I also keep an eye on blocked queues:

|  tstats latest(PREFIX(max_size_kb=)) as max_size_kb latest(PREFIX(largest_size=)) as largest_size where index=_internal source=*metrics.log* TERM(group=queue) TERM(blocked=true) by host PREFIX(name=)
0 Karma

Navanitha
Path Finder

@tscroggins  Thank you for looking into my query.  I tried the search query you posted and the results are same as my search query.  What I am looking for a consolidated report for example, in the output I pasted in my original post, instance "Y" has all the four queues full (parsingQueue* OR title=aggQueue* OR title=typingQueue* OR title=indexQueue) so my output should only be this instance name.  I will set up and alert for this host for further action.  Any suggestions pls ?

 

0 Karma

tscroggins
Influencer

@Navanitha 

In the table in your original post, only instance X would pass the new where clause. If you want to reduce the results to just an instance name, you can add stats, dedup, etc. to your search:

| stats count by splunk_server
| fields - count

These would replace the rename command.

0 Karma

Navanitha
Path Finder

Query seems to be working but partially.  When I run the query I get results for splunk_server whose one of the  parsing queue pipeline is not greater than the threshold I set (which is >80). As per my requirement this server xyz should not showup as its parsing_queue.0 is not greater than thershold. (It should only report if all its 3 pipelines 4 Queues are greater than 80).

title fifteen_min_fill_perc splunk_server

aggQueue.087.79xyz
aggQueue.187.66xyz
aggQueue.286.22xyz
indexQueue.088.43xyz
indexQueue.187.96xyz
indexQueue.289.16xyz
parsingQueue.065.10xyz
parsingQueue.186.32xyz
typingQueue.088.28xyz
typingQueue.187.87xyz
typingQueue.289.13xyz

Appreciate if you could also help me understand more on why dc is used here and how does it work?  

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...