Our data flow is syslog server sending more number of data to one HF1, then its routing to a indexer cluster as well as to another HF2. from this another HF2, routing data to syslogNG and another indexer cluster, located in different environment
Due to high volume of data in our syslog server, we increased the pipeline queue size as 2500MB.
we faced backpressure in syslog and HFs , so vendor recommended to increase the pipeline size as 2500MB under server.conf , in both HFs and syslog server.
now the issue is HF2 consuming full memory(92GB) recently after the server reboot. after consume 100% memory , HF2 went hung . if we decrease the parallel pipeline from 2 to 1 in HF2, it create backpressure in syslog server and HF1 , and pipelines getting burst.
before the HF2 reboot, the memory consumption was less than 10GB only with pipeline size as 2500MB and Splunkd process was normal.
Note: so far HF1 not facing any memory(92GB) issue, located in between syslog server and HF2
now in this situation , increasing the memory in HF2 will be helpful ? or what will be best solution to overcome this scenario in future
Hi @Raghavsri
Increasing memory on HF2 may provide temporary relief but does not address the root cause. Excessive pipeline queue size (2500MB) can cause splunkd to consume large amounts of memory, especially if data flow is uneven or downstream components are slow. You also risk losing larger volumes of data if Splunk/system crashes because all the data in the queues will be lost.
Queues should really be used as a buffer, not to expand throughput.
I would suggest:
Ultimately, a large pipeline queue can mask underlying issues and lead to memory exhaustion. Memory upgrades alone will not prevent future hangs if the pipeline is oversized or downstream issues persist.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
in splunkd.log , of HF2 I could see these
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_host_thruput, ingest_pipe=2, series="lmpsplablr001", kbps=4.247, eps=24.613, kb=131.668, ev=763, avg_age=2.279, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_index_thruput, ingest_pipe=2, series="_internal", kbps=2.206, eps=13.032, kb=68.396, ev=404, avg_age=2.233, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_index_thruput, ingest_pipe=2, series="_metrics", kbps=2.041, eps=11.581, kb=63.272, ev=359, avg_age=2.331, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_source_thruput, ingest_pipe=2, series="/mnt/splunk/splunk/var/log/splunk/audit.log", kbps=0.000, eps=0.032, kb=0.000, ev=1, avg_age=0.000, max_age=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_source_thruput, ingest_pipe=2, series="/mnt/splunk/splunk/var/log/splunk/metrics.log", kbps=4.082, eps=23.355, kb=126.545, ev=724, avg_age=2.312, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_source_thruput, ingest_pipe=2, series="/mnt/splunk/splunk/var/log/splunk/splunkd_access.log", kbps=0.165, eps=1.226, kb=5.123, ev=38, avg_age=1.711, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_sourcetype_thruput, ingest_pipe=2, series="splunk_audit", kbps=0.000, eps=0.032, kb=0.000, ev=1, avg_age=0.000, max_age=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_sourcetype_thruput, ingest_pipe=2, series="splunk_metrics_log", kbps=2.041, eps=11.677, kb=63.272, ev=362, avg_age=2.312, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_sourcetype_thruput, ingest_pipe=2, series="splunkd", kbps=2.041, eps=11.677, kb=63.272, ev=362, avg_age=2.312, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=per_sourcetype_thruput, ingest_pipe=2, series="splunkd_access", kbps=0.165, eps=1.226, kb=5.123, ev=38, avg_age=1.711, max_age=3
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=tcpout_my_syslog_group, max_size=512000, current_size=0, largest_size=0, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=tcpout_primary_indexers, max_size=512000, current_size=0, largest_size=219966, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=aggqueue, max_size_kb=2560000, current_size_kb=0, current_size=0, largest_size=260, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=indexqueue, max_size_kb=2560000, current_size_kb=0, current_size=0, largest_size=8, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=nullqueue, max_size_kb=2560000, current_size_kb=0, current_size=0, largest_size=1, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=parsingqueue, max_size_kb=2560000, current_size_kb=0, current_size=0, largest_size=2, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=syslog_system, max_size_kb=97, current_size_kb=0, current_size=0, largest_size=1, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=syslog_system2, max_size_kb=97, current_size_kb=0, current_size=0, largest_size=1, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=queue, ingest_pipe=2, name=typingqueue, max_size_kb=2560000, current_size_kb=0, current_size=0, largest_size=257, smallest_size=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=syslog_connections, ingest_pipe=2, syslog_system2:x.x.x.x:514:x.x.x.x:514, sourcePort=8089, destIp=x.x.x.x, destPort=514, _tcp_Bps=0.00, _tcp_KBps=0.00, _tcp_avg_thruput=3.71, _tcp_Kprocessed=2266, _tcp_eps=0.00
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=syslog_connections, ingest_pipe=2, syslog_system:y.y.y.y:514:y.y.y.y:514, sourcePort=8089, destIp=y.y.y.y, destPort=514, _tcp_Bps=0.00, _tcp_KBps=0.00, _tcp_avg_thruput=0.35, _tcp_Kprocessed=213, _tcp_eps=0.00
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=tcpout_connections, ingest_pipe=2, name=primary_indexers:z.z.z.z0:9997:0:0, sourcePort=8089, destIp=z.z.z.z0, destPort=9997, _tcp_Bps=1545.33, _tcp_KBps=1.51, _tcp_avg_thruput=1.51, _tcp_Kprocessed=45, _tcp_eps=0.50, kb=44.94
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=thruput, ingest_pipe=2, name=index_thruput, instantaneous_kbps=0.000, instantaneous_eps=0.000, average_kbps=0.000, total_k_processed=0.000, kb=0.000, ev=0
06-13-2025 12:34:19.086 +0800 INFO Metrics - group=thruput, ingest_pipe=2, name=thruput, instantaneous_kbps=4.505, instantaneous_eps=26.000, average_kbps=10.456, total_k_processed=6705.000, kb=139.648, ev=806, load_average=1.500
any abnormalities in this entries ? I suspect that issue is with HF 2 only .. when HF 2 stopped , everyother things works fine ..if HF2 service initiated, it start utilize 1GB to 50GB of memory only out of 130GB , then the HF1 start use memory and log ingestion getting stopped .. especially from syslog server ( large log volume input) -> this index getting affected first
Hence increasing memory in HF 2 not helpful here
Hi @Raghavsri ,
i had a similar issue in a past project.
Check the parsing rules, maybe there are some not optimized regexes that requires too much memory, especially regexes that starts with ".*"
Ciao.
Giuseppe
@Raghavsri
Whats the version of splunk you are running?
Also to start with, check few options.
Review logs: Look for errors, warnings, or abnormal behavior in splunkd.log
Check destination health: Ensure that SyslogNG and the second indexer cluster are healthy and accepting data efficiently
Also If HF2 is not able to forward data fast enough (due to network, destination, or performance issues), the queue fills up, consuming memory
Memory upgrade: Increasing memory on HF2 may help if the issue is due to legitimate high data volume and not a leak or misconfiguration. However, if the problem is a memory leak/bandwidth issue, increasing memory will only delay the inevitable crash
Load Balancing: Consider load balancing across multiple HFs if possible, to distribute the data load
Monitor memory usage: Set up alerts for high memory usage to detect issues early.
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!
9.2.2 is the current version.
so the queue fillup and memory consumption in HF2, may be due to outgoing traffic ? it wont cause due to incoming large data , routing from HF1..
yes, we plan to configure add one more HF in HF2 layer as LB but it take some time. but we need to fix current ongoing issue.