I believe I've identified the root cause as slow disk for the COLD_DB. Our configuration is to have the hot/warm DBs on local attached (virtually, anyway) disks, and point the cold_dbs to a CIFs share on a NetApp... so index.conf looks something like this...
[databases]
coldPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\colddb
homePath = F:\CustomIndex\DATA_2\databases\db
thawedPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\thaweddb
maxWarmDBCount = 32
So, I created another locally attached drive, and used it as the coldpath on ONE of the three indexers we have. After 4 hours, we have not seen ANY blocking on the indexer with the "locally attached" drive, while the other indexers continue to see blocking at the same rate as before. In this particular case, the slow disk was the cold db. If there a way to have splunk roll the files to cold on a schedule, rather than constantly.. this would not be a problem..
... View more