Splunk Enterprise Security

Receiving blocked=true while syslog/heavy forwarder trying to send data through indexer servers

sureshkumaar
Path Finder

Hi All,

       I have 4 Heavy forwarder servers sending data through 5 indexers

server1 acts as syslog server which has autoLBFrequency as 10 and maxQueueSize as 1000MB

server2 acts as syslog and heavy forwarder which has autoLBFrequency as 10 and maxQueueSize as 500MB

server3 acts heavy forwarder which has autoLBFrequency as 10 and maxQueueSize as 500MB

server4 acts heavy forwarder which has autoLBFrequency as 10 and maxQueueSize as 500MB

   Receiving blocked=true in metrics.log while syslog/heavy forwarder trying to send data through indexer servers. Due to this index ingestion is getting delayed and data is coming to Splunk 2-3 hours late.

        And in one of the 5 indexer servers CPU is always highly utilized from 99-100% consistently which has 24 CPU, other indexer servers also running with 24 CPU.

         Planning to upgrade highly utilized indexer server alone from 24 to 32

        Kindly suggest by updating below in outputs.conf will reduce/stop the "blocked=true" in metrics.log and CPU load on indexer will be normal before upgrading the CPU.

        OR we need to do both, changes in outputs.conf and upgrading the CPU. If both can be done which is the first we can try. Kindly help.

autoLBFrequency = 5
maxQueueSize = 1000MB
aggQueueSize = 7000
outputQueueSize = 7000

Labels (1)
0 Karma

sureshkumaar
Path Finder

as per the monitoring console could see indexing queue and splunktcpin queue is high

Screenshot 2025-04-24 094537.png

0 Karma

livehybrid
Super Champion

Hi @sureshkumaar 

Further to my last reply - there are also a couple of worthwhile resources here which give an overview of how to identify and deal with blocked queues.

https://docs.splunk.com/Documentation/Splunk/8.2.4/Deploy/Datapipeline

How to Troubleshoot Blocked Ingestion Pipeline Queues with Indexers and Forwarders - https://conf.sp...

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

sureshkumaar
Path Finder

Thanks @livehybrid  for your inputs, I did checked the blocked=true for all the 4 heavy forwarder and could see in one of the heavy forwarder which acts as syslog server collecting the network related data where the typing queue is found which is considered as a bottleneck as i went through the PDF.

And as per the PDF i grepped the metrics.log to see which sourcetype and host consuming more CPU 

04-22-2025 05:19:58.017 +0700 INFO Metrics - group=per_sourcetype_regex_cpu, series="cp_log", cpu=604, cpupe=0.0005149352537121802, bytes=1072305900, ev=1172963

04-22-2025 05:19:58.011 +0700 INFO Metrics - group=per_host_regex_cpu, series="networkserver", cpu=596, cpupe=0.0005081981051714273, bytes=1072185809, ev=1172771

Screenshot 2025-04-22 114936.png

Kindly let me know what to do next

0 Karma

livehybrid
Super Champion

Hi

Increasing autoLBFrequency, maxQueueSize, aggQueueSize, or outputQueueSize in outputs.conf on your heavy forwarders may help temporarily reduce "blocked=true" messages, but these settings do not address the root cause: your indexer(s) are overloaded and unable to keep up with incoming data.

The following will tell you which queues are blocking on which servers:

index=_internal source=*metrics.log blocked=true
| stats count by host, group, name

 

  • "blocked=true" in metrics.log means the forwarder cannot send data to the indexer because the indexer is not accepting it fast enough (usually due to CPU, disk, or queue saturation).
  • Increasing forwarder queue sizes only buffers more data; it does not fix indexer bottlenecks.
  • The indexer with 99–100% CPU is a clear bottleneck. Upgrading its CPU may help, but if the load is not balanced across all indexers, you may need to investigate why (e.g., uneven load balancing, hot buckets, or misconfiguration).
  • Lowering autoLBFrequency (e.g., from 10 to 5) can help distribute load more evenly, but will not solve indexer resource exhaustion.

 

Do not rely solely on queue size increases; this can delay but not prevent data loss if indexers remain overloaded.

Investigate why one indexer is overloaded (check for hot buckets, network issues, or misconfigured load balancing). Understanding *why* the single indexer is blocking is probably the important thing here - it could be a number of things but likely to be either resource issue (e.g. faulty disk) or one of your syslog feeds failing to balance to another indexer.

Is it always the same indexer that runs hot? Or does it change?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Get Updates on the Splunk Community!

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Did you know that for Splunk Enterprise 9.4, Python 3.9 is the default interpreter? This shift is not just a ...