Splunk Dev

Why is the parsingQueue blocking on only one server?

sochsenbein
Communicator

Out of 19 windows servers running the same services, there is one server that keeps on blocking at parsingQueue. I have increased the size to 30MB while the others remain under 10MB, but it keeps on blocking.

I ran the following code to check how many events hit each server and found that they are all even:

index=_internal host=<server_name>* group=queue name=parsingqueue | timechart span=60m limit=0 count by host

Next, I ran this search to check the size of the queue and found that while the rest of the servers are at about ~1000, the server that is blocking is above 70K!

index=_internal host=<server_name>* group=queue name=parsingqueue | timechart span=60m limit=0 sum(current_size) by host

They are all running with the same system specs: 64-bit, 3.07GHz, 12 core, 6 cores per CPU and 96 gigs of RAM. They all have plenty of disk space, as well. Is there a way to check the increase/output rate for the queue? Also, I am not sure how far to increase before it becomes dangerous. Is there anything else I have missed?

TIA,

Skyler

0 Karma

DalJeanis
Legend

To use a metaphor someone else used recently on slack, when your toilet is backing up, you do not increase the size of the bathroom... you find out why it is plugged. Glad you're here to check on that, and please accept my appreciation of your excellent write-up, which covered the first three things that I'd look at.

In fact, two of the suggestions I'm about to make are basically saying, look one level deeper at the same place you've already looked...

Here are some triage steps that I would be taking ...

1) Verify that indexer has the EXACT same configuration as all the others. (You've listed that, but literally, think of every setting you can and check them all.) Specifically, don't look at just the entire disk drive, but look at the volumes specifically allocated to splunk.

2) Search and see if there is some source, host, sourcetype, etc that is aiming at that indexer and not at any other indexer (this is the FIRST thing I'd check.) So, we're not talking just how many events, but what KIND of events...

3) Check the total number of bytes being indexed by indexer, and see if that one is way off from the others. If it is NOT, then number 2 above MUST apply... something is sending preferentially to that indexer. If it IS, then you know it is something regarding the indexer itself. Figuring out what may be as simple as (aircode)

 | tstats count as countbyserver where index=* by sourcetype splunk_server 
 | eventstats sum(countbyserver) as countbysourcetype  by sourcetype
 | where splunk_server="mybadboy"
 | eval ratio=(100.00*countbyserver)/countbysourcetype
 | where ratio > 10 

That should give you a list of sourcetypes where that bad boy is getting more than 10% of the traffic for the sourcetype. If none of those, then do one that calcs percentage of traffic from each sending host.

4) If all else fails, take that indexer off line and see if any other indexer suddenly exhibits the same behavior.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...