Getting Data In

parallelIngestionPipelines

verbal_666
Builder

Hello.

I'm actually using a

parallelIngestionPipelines = 2

feature on my Indexers. Works.

Servers (Linux) are professional, with 24CPU and 48GB RAM.

 

I'm wondering, someone had ever tried a

parallelIngestionPipelines = 4

on his Indexers?

Works?

Crashes?

 

Thanks.

Labels (2)
0 Karma
1 Solution

PrewinThomas
Motivator

@verbal_666 

parallelIngestionPipelines = 2, this is considered the optimal setting for most deployments. Increasing it beyond 2 is technically feasible but generally not advised unless you proceed with significant caution and have confirmed your infrastructure can support the additional load.

I tested with 4(not more than this) but experienced instability, especially during bursty loads and when additional apps were introduced. For this reason, I’m keeping the setting at 2. This configuration has proven more stable in my environment.

Theoretically ingest more data in parallel, when you set to 4. But high risk of OOM and crashes. Splunk highly recommends to consult PS if you want to set beyond 2.


#https://help.splunk.com/en/splunk-enterprise/administer/manage-indexers-and-indexer-clusters/9.4/man...

 

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

As usual - "it depends".

During normal indexing a single pipeline engages 4-6CPU. So if you have a host which does nothing but ingestion processing (a HF), you can relatively harmlessly raise your number of pipelines and the performance scales quite well (maybe not straight linearliy but not much worse).

But on an indexer you have to remember about two things:

1) You're still limited by the fact that you have to write all that to disk at the end of the pipeline (so the performance improvement will be significantly less than linear).

2) Typically indexers mostly do searching after all. So tying CPUs to ingest processing leaves you with much less left resources for searching. That might lead to problems with long running/delayed/skipped searches.

So on a modern reasonably sized box, with a typical use case indeed 1 or 2 parallel ingestion pipelines seem the optimal settings. With a slightly atypical architecture (for example a separate HF layer which does the heavy lifting and indexers only receive the parsed data and write it to disks), you could consider raising the parameter more.

0 Karma

verbal_666
Builder

CPU bottleneck.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @verbal_666 ,

I tried parallelPipelines=4 but I came back to 2 because indexing was better than 2 but I had issues in searches that were slower.

Ciao.

Giuseppe

verbal_666
Builder

Perfect 👍👍👍

That's what i wanted to know 👏👏👍

Many thanks 👍

PrewinThomas
Motivator

@verbal_666 

parallelIngestionPipelines = 2, this is considered the optimal setting for most deployments. Increasing it beyond 2 is technically feasible but generally not advised unless you proceed with significant caution and have confirmed your infrastructure can support the additional load.

I tested with 4(not more than this) but experienced instability, especially during bursty loads and when additional apps were introduced. For this reason, I’m keeping the setting at 2. This configuration has proven more stable in my environment.

Theoretically ingest more data in parallel, when you set to 4. But high risk of OOM and crashes. Splunk highly recommends to consult PS if you want to set beyond 2.


#https://help.splunk.com/en/splunk-enterprise/administer/manage-indexers-and-indexer-clusters/9.4/man...

 

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...