Hi,
We have high-volume syslog input configured on a HF with Splunk v.7.2.5 and we started noticing TailReader-0 pipeline reporting warning messages that it cannot process more data because queue is full. We saw that only 1 core on the machine was 100% utilized so we thought it is time to enable parallel ingestion pipelines, so we changed the configuration with "parallelIngestionPipelines=2".
We saw that the pipelines are now 2 but the second one is not used at all. I am checking on the DMC and only the first pipeline is 100% used, while the second stays at 0.
Why is it behaving like that? Is it because the data input is still one and the same (e.g. syslog/udp:514)? We have plenty of resources on that machine that is not used (e.g. 8 cores of which only 1 is currently utilized). Any advice?
Hi @vanvan ,
I had a similar issue, my hint is to use rsyslog (or syslog-ng) server to take and write logs in files, then use the HF to read these files and elaborate and send them to Indexers.
In this way you have two advantages:
Then, if you have fast disks and you haven't a slow network, you can use parallel_pipeline to better use your CPUs.
How many CPUs there are in your HFs?
I had 16 CPUs and I passed to 24 to have more performant queues.
then you could optimize your configuration enlarging queues that you can check on your search heads running this search:
index=_internal source=*metrics.log sourcetype=splunkd group=queue
| eval name=case(name=="aggqueue","2 - Aggregation Queue",
name=="indexqueue", "4 - Indexing Queue",
name=="parsingqueue", "1 - Parsing Queue",
name=="typingqueue", "3 - Typing Queue",
name=="splunktcpin", "0 - TCP In Queue",
name=="tcpin_cooked_pqueue", "0 - TCP In Queue")
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size)
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size)
| eval fill_perc=round((curr/max)*100,2)
| bin _time span=1m
| stats Median(fill_perc) AS "fill_percentage" perc90(fill_perc) AS "90_perc" max(max) AS max max(curr) AS curr by host, _time, name
| where (fill_percentage>70 AND name!="4 - Indexing Queue") OR (fill_percentage>70 AND name="4 - Indexing Queue")
| sort -_time
Last hint: check the regexes on your custom add-ons to avoid unuseful overload.
Ciao.
Giuseppe
Hi,
Have you try to change this parameter ?
pipelineSetSelectionPolicy = round_robin | weighted_random
If this parameter is configure to "round_robin", this can explain your issue. Change to weighted_random (idx) and check if it's better in monitoring console.
I think you can put "parallelIngestionPipelines" to 4 in your uf if you it's configure to 2 in idx.
Yeah, if you just have a single UDP input, then that will not be split over 2 pipelines.
For syslog the recommended approach is to use a real syslog daemon (rsyslog / syslog-ng) to receive the syslog traffic and write it to files (e.g. split by sending host etc.) and then set up one or more file monitor inputs to ingest the data into splunk. Not 100% sure if you would need multiple file monitor inputs to benefit from multi-pipeline. Perhaps a single input that has multiple files to process will work as well.
Also: if you have queueing issues, make sure to have a close look at what queue is exactly the bottleneck. Not much point in adding more processing power on your HF if the bottleneck is at your indexer (cluster) or the network bandwidth between HF and indexer(s). Although the advantage of 2 pipelines on the HF is that those would each also be sending to their own indexer (assuming you have multiple and use auto load balancing output on the HF) so it would help spread the load on the indexing layer as well.