Splunk Enterprise

The index processor has paused data flow- How to optimize?

edoardo_vicendo
Contributor

Hello,

We are still facing the following issue when we put in maintenance mode our Indexer Cluster and we stop one Indexer.

Basically all the Indexers stop ingesting data, increasing their queues, waiting for splunk-optimize to finish the job.

This usually happens when we stop the Indexer after a long time since last time.

Here below an example of the error message that appears on all the Indexers at once, on different bucket directory:

 

 

throttled: The index processor has paused data flow. Too many tsidx files in idx=myindex bucket="/xxxxxxx/xxxx/xxxxxxxxxx/splunk/db/myindex/db/hot_v1_648" , waiting for the splunk-optimize indexing helper to catch up merging them. Ensure reasonable disk space is available, and that I/O write throughput is not compromised.

 

 

Checking further, going into the bucket directory, I was able to see hunderds of .tsidx files. What splunk-optimize does is to merge those .tsidx files.

 

We are running Splunk Enterprise 9.0.2 and:

- on each Indexer the disk reach 150K IOPS

- we already performed this set-up that improved the effect, but hasn't solved it:

 

 

indexes.conf

[default]
maxRunningProcessGroups = 12
processTrackerServiceInterval = 0

 

 

Note: we kept maxConcurrentOptimizes=6 as default, because we have to keep maxConcurrentOptimizes <= maxRunningProcessGroups (this has been also confirmed by Splunk support, that informed me maxConcurrentOptimizes is no longer used (or used with less effect) since 7.x and it is there mainly for compatibility)

- I know since 9.0.x there is the possibility to manually run splunk-optimize over the affected buckets, but this seems to me more a workaround than a solution. Considering a deployment can have multiple Indexers it is not straightforward

 

What do you suggest to solve this issue?

 

Thanks a lot,

Edoardo

Labels (2)
Tags (2)
0 Karma
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

Fix is in 9.1(next major release)  for this type of scenario. Try following workaround to reduce outage.

In server.conf
[queue=indexQueue]
maxSize=500MB

In indexes.conf
[default]
throttleCheckPeriod=5
maxConcurrentOptimizes=1
maxRunningProcessGroups=32 
processTrackerServiceInterval=0

View solution in original post

hrawat_splunk
Splunk Employee
Splunk Employee

Fix is in 9.1(next major release)  for this type of scenario. Try following workaround to reduce outage.

In server.conf
[queue=indexQueue]
maxSize=500MB

In indexes.conf
[default]
throttleCheckPeriod=5
maxConcurrentOptimizes=1
maxRunningProcessGroups=32 
processTrackerServiceInterval=0

edoardo_vicendo
Contributor

Replying also here, referring to this post:

https://community.splunk.com/t5/Getting-Data-In/Why-has-the-index-process-paused-data-flow-How-to-ha...

 

We ended up with this configuration:

In server.conf
[queue=indexQueue]
maxSize=500MB

In indexes.conf
[default]
throttleCheckPeriod=5
maxConcurrentOptimizes=2
maxRunningProcessGroups=32 
processTrackerServiceInterval=0

 

In this way we have both the benefits:

  • if we restart the cluster we don't have anymore the IndexWriter message
  • during the normal running we don't have the HealthChangeReporter OR PeriodicHealthReporter messages anymore

Thanks a lot for your suggestion!

Tags (2)
0 Karma

dnavara
Explorer

Hi we are seeing the same issue at peak times. 
Currently the bucket size is set to auto=750MB ( for few of our indexes this means rolling 24 hot buckets to warm every minute)

Could increasing this to

auto_high_volume 

help as the buckets wouldn't need to be rolled as often ?

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

Your scenario is slightly different than the issue reported in this post. 
Do you see paused log messages across all (or majority of indexers) at the same time?
Also 24 hot buckets rolled over to warm per minute across all indexers or one indexer?

0 Karma

dnavara
Explorer

We see the following message on most of the indexers ( varies anywhere from 1-4):

throttled: The index processor has paused data flow. Too many tsidx files in idx=myindex bucket="/xxxxxxx/xxxx/xxxxxxxxxx/splunk/db/myindex/db/hot_v1_648" , waiting for the splunk-optimize indexing helper to catch up merging them. Ensure reasonable disk space is available, and that I/O write throughput is not compromised.

 

It's 24 hot buckets rolled to warm across all indexers. Currently we have 4 indexers and each has 2 parallel pipelines. Two of our indexes see this sort of the rate of roll. Rest of them ingest lot less data.

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

@dnavara
Have you already applied above settings? You need to only apply indexes.conf settings and restart splunk.

0 Karma

dnavara
Explorer

Hi no I haven't yet, but I'll give it a try. I am just wondering if it has anything with how often buckets need to be rolled

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

@dnavara 

If you see following

throttled: The index processor has paused data flow. Too many tsidx files in idx=myindex bucket="/xxxxxxx/xxxx/xxxxxxxxxx/splunk/db/myindex/db/hot_v1_648" , waiting for the splunk-optimize indexing helper to catch up merging them. Ensure reasonable disk space is available, and that I/O write throughput is not compromised.

 That means ingestion is very high that isidx files getting created faster than splunk can merge.  Add one more config.

In indexes.conf
[default]
throttleCheckPeriod=5
maxConcurrentOptimizes=1
maxRunningProcessGroups=32 
maxMemMB = 25
processTrackerServiceInterval=0


 

dnavara
Explorer

Hi we see the same issue on Splunk 9.1.2.

What was the reason for lowering this to 1 from the default of 6?

maxConcurrentOptimizes=1

 

0 Karma

edoardo_vicendo
Contributor
0 Karma

dnavara
Explorer

Thanks 🙂

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...