Getting Data In

What are Best Practices for Pipeline parallelization? (performance questions)

lukasmecir
Path Finder

Hello,

I have question about pipeline parallelization. From docu and other sources I find that is safe enable pipeline parallelization if I have plenty of free resources in Splunk deployment, particularly CPU cores. In other words, if CPU on indexers or heavy forwarders are "underutilized". But, my question is - what does it mean "underutilized" in numbers? Especially in distributed environment. Example: lets imagine I have IDX cluster. 8 nodes, 16 CPU cores each. I see in Monitoring console )historical charts) average CPU load 40%, median CPU load 40% and maximum CPU load between 70 - 100%. My opinion is it is not safe to enable parallelization in this environment, OK? But when it is safe - if maximum load is under 50% Or 25%? What factors I should take into calculations and what numbers are "safe"? Could you please share your experience or point me to some available guide? Thank you very much in advance.

Best regards

Lukas Mecir

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

1st, have you issues in ingesting e.g. there are delays to get data in enough fast? And if is this due to delays in HF/IDX before indexing processor (writing to disk)? If not then I don't see any reason why you add more pipelines? If you have those parsing etc. issues in IDX side then you can add more pipelines to try to help situation. Just add one and look what happening for CPU and did' this solve your ingesting issues or is it just moving those to disk writing side? Or did this lead to unbalanced events between different indexers?

r. Ismo

0 Karma

lukasmecir
Path Finder

Hi,

yes, I have issues in ingesting (delay to get data) and this is due to delays in parsing process in HF/IDX (it was verified by Splunk support). So now I am considering two options:

  1. Optimalization of parsing process (especially REGEXs)
  2. Add more pipelines (in case the parsing optimization does not produce the expected result)

Regards

Lukas

 
 

 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

If/when you have HFs before IDXs and you have verified that issue is in parsing phase then you could start with adding second pipelines on those HFs and see what will happened (both OS and Splunk level). In OS level us those normal OS tools like nmon, top etc. to see what is situation before and after change. On Splunk level I prefer to use MC to check what there are. You could/should add those HFs as an indexers to MC. Just create separate groups for HFs and real indexers to follow up an monitoring those as needed.

Then you should also check those REGEX it there are something what can fixed. Good tool for that is regex101.com. You could see the cost of regex also and how many steps it needs.

r. Ismo

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...