Getting Data In

how to configure multiple indexing queues in indexes.conf? What is the impact if we configure two indexing queues and then reduce to one?

Contributor

We have a requirement which our architects think needs to have multiple indexing queue.
can anyone provide a reference example configuration for createing two queues?
which conf files we need to maintain? and
what is the impact of having two queues?

With reference to the above: Suppose we find out that setting multiple indexing queue is not working as expected, then what is the impact of reducing indexing queue numbers to only one?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

I believe the post is regarding Parallelization settings, in particular Index Parallelization .
As per the documentation:
"Index parallelization allows an indexer to maintain multiple pipeline sets. A pipeline set handles the processing of data, from receiving streams of events, through event processing, and writing the events to disk. By allowing an indexer to create and operate multiple pipelines, multiple data streams can be processed with additional CPU cores, accelerating data parsing and disk writing up to the limits of the indexer's I/O capacity. Customers leveraging index parallelization can see an increase in an indexer's sustained indexing load, or a doubling of indexing speed when receiving a sudden surge of data from the forwarders. "

"Adjusting the parallelIngestionPipelines setting in server.conf to 2 will use an additional 4-6 CPU cores, and requires 300-400 IOPS to maintain indexing thruput on every indexer. Also, there are fewer CPU cores available for search processing. A value of 2 provides the best performance increase, with higher values succumbing to diminishing returns. For configuration details, see Manage pipeline sets for index parallelization in the Splunk Enterprise Managing Indexers and Clusters of Indexers Manual "

If you have the appropriate capacity on the indexers you can use multiple pipelines and this will provide multiple queues (as each pipeline has it's own set of queues).

In terms of dropping the number down again, or back to 1, nothing too interesting will happen, you would just be reducing the amount of data you can index and also reducing the CPU/IO load on the indexer...

View solution in original post

SplunkTrust
SplunkTrust

I believe the post is regarding Parallelization settings, in particular Index Parallelization .
As per the documentation:
"Index parallelization allows an indexer to maintain multiple pipeline sets. A pipeline set handles the processing of data, from receiving streams of events, through event processing, and writing the events to disk. By allowing an indexer to create and operate multiple pipelines, multiple data streams can be processed with additional CPU cores, accelerating data parsing and disk writing up to the limits of the indexer's I/O capacity. Customers leveraging index parallelization can see an increase in an indexer's sustained indexing load, or a doubling of indexing speed when receiving a sudden surge of data from the forwarders. "

"Adjusting the parallelIngestionPipelines setting in server.conf to 2 will use an additional 4-6 CPU cores, and requires 300-400 IOPS to maintain indexing thruput on every indexer. Also, there are fewer CPU cores available for search processing. A value of 2 provides the best performance increase, with higher values succumbing to diminishing returns. For configuration details, see Manage pipeline sets for index parallelization in the Splunk Enterprise Managing Indexers and Clusters of Indexers Manual "

If you have the appropriate capacity on the indexers you can use multiple pipelines and this will provide multiple queues (as each pipeline has it's own set of queues).

In terms of dropping the number down again, or back to 1, nothing too interesting will happen, you would just be reducing the amount of data you can index and also reducing the CPU/IO load on the indexer...

View solution in original post

Contributor

your answer is very close to what I am trying to understand.
None of these documents are having description of multiple queues in indexer ( such as 2 parsing queue, 2 indexing queue). I understand scaling is achieved by adding indexers. however that is not I am looking for.
please correct me if I am wrong in my statement.

0 Karma

SplunkTrust
SplunkTrust

The definition isn't obvious, perhaps you could send some feedback on the documentation?

Datapipeline (from Splexicon) :

The route that data takes through
Splunk Enterprise, from its origin in
sources such as log files and network
feeds, to its transformation into
searchable events that encapsulate
valuable knowledge. The data pipeline
includes these segments:

Input
Parsing
Indexing
Search

So yes you will have 2 indexing queues, 2 aggregation queues, et cetera...(assuming you increase the index parallelization setting to 2).

0 Karma

Contributor

Going by analysis and few trials on a sandbox with limited data, the explanation is clear. Hence accepting this answer.

Ultra Champion

i think the terminology is whats confusing here.

You cant really have more than one queue (hence my previous emphasis on desired).

[edit: I don't know why, but i cant add an image to this answer, so see here for an overview: https://wiki.splunk.com/Community:HowIndexingWorks ]

Each indexer has a number of queues, and each queue marks the transition from one stage of processing to the next - its not possible to add more 'parallel queues' on an indexer

If you want to increase indexing performance, add more indexers (which, sort of adds more queues if you like)

0 Karma

Ultra Champion

I think this depends on what you are trying to achieve - What is the purpose of the desired second queue?

I wonder if what is being referred to is routing - i.e sending one type of event data to a specific indexer, and everything else to a standard indexer. (perhaps if some of the data is more sensitive etc)
http://docs.splunk.com/Documentation/Splunk/7.0.1/Forwarding/Routeandfilterdatad

In terms of long term support - if ever you needed to reverse this, you could either:
point both routes as the same destination,
or remove the routing completely

0 Karma

Contributor

There is no specific requirement for this. we want to understand various mechanisms to improve performance (overall) and one place is queueing.
we just wanted to understand how to configure multiple queues? any examples?
and what is impact on performance ( such as CPU usage, memory usage).

what is the major difference in having single queue over multiple queue ?

0 Karma

Communicator

Is he meaning truly queues not multiple indexers?
I'd rather suggest you add another indexer to your environment to boost the performance.

If queues i meant, then check this file:
/opt/splunk/etc/system/default/server.conf
Never done this, but there you'll should find some hints about queues

Here the defintion of the parsingQueue in the server.conf
[queue=parsingQueue]
maxSize = 6MB
# look back time in minutes
cntr_1_lookback_time = 60s
cntr_2_lookback_time = 600s
cntr_3_lookback_time = 900s
# sampling frequency is the same for all the counters of a particular queue
# and defaults to 1 sec
sampling_interval = 1s

0 Karma