We have high-volume syslog input configured on a HF with Splunk v.7.2.5 and we started noticing TailReader-0 pipeline reporting warning messages that it cannot process more data because queue is full. We saw that only 1 core on the machine was 100% utilized so we thought it is time to enable parallel ingestion pipelines, so we changed the configuration with "parallelIngestionPipelines=2".
We saw that the pipelines are now 2 but the second one is not used at all. I am checking on the DMC and only the first pipeline is 100% used, while the second stays at 0.
Why is it behaving like that? Is it because the data input is still one and the same (e.g. syslog/udp:514)? We have plenty of resources on that machine that is not used (e.g. 8 cores of which only 1 is currently utilized). Any advice?
Yeah, if you just have a single UDP input, then that will not be split over 2 pipelines.
For syslog the recommended approach is to use a real syslog daemon (rsyslog / syslog-ng) to receive the syslog traffic and write it to files (e.g. split by sending host etc.) and then set up one or more file monitor inputs to ingest the data into splunk. Not 100% sure if you would need multiple file monitor inputs to benefit from multi-pipeline. Perhaps a single input that has multiple files to process will work as well.
Also: if you have queueing issues, make sure to have a close look at what queue is exactly the bottleneck. Not much point in adding more processing power on your HF if the bottleneck is at your indexer (cluster) or the network bandwidth between HF and indexer(s). Although the advantage of 2 pipelines on the HF is that those would each also be sending to their own indexer (assuming you have multiple and use auto load balancing output on the HF) so it would help spread the load on the indexing layer as well.