The total indexing throughput per indexer was reduced significantly after upgrading to 5.0 or 5.0.1 from 4.3.x. Splunk is spending considerable amounts of CPU time on service_maxSizes. Due to this issue, forwarder connections are being refused by the indexers. What is going on here?
Splunk is aware of this issue and below is a work-around to apply. If you have indexes with with values set for homePath.MaxDataSizeMB or coldPath.MaxDataSizeMB in indexes,conf, you can mitigate this issue by editing the appropriate copy of indexes.conf to disable serviceOnlyAsNeeded.
Here is a search that you can run to check your MaxDataSizeMB before and after applying the work-around:
index=_internal host=<indexer_hostname> source=*metrics.log*
group=subtask_seconds | fields replicate_semislice, sync_hotBkt,
throttle_optimize, flushBlockSig, retryMove_1hotBkt, size_hotBkt,
roll_hotBkt, chillOrFreeze, update_checksums, fork_recovermetadata,
rebuild_metadata, update_bktManifest, service_volumes, service_maxSizes,
service_externProc | timechart minspan=30s sum(*) AS *
Splunk is aware of this issue and below is a work-around to apply. If you have indexes with with values set for homePath.MaxDataSizeMB or coldPath.MaxDataSizeMB in indexes,conf, you can mitigate this issue by editing the appropriate copy of indexes.conf to disable serviceOnlyAsNeeded.
Here is a search that you can run to check your MaxDataSizeMB before and after applying the work-around:
index=_internal host=<indexer_hostname> source=*metrics.log*
group=subtask_seconds | fields replicate_semislice, sync_hotBkt,
throttle_optimize, flushBlockSig, retryMove_1hotBkt, size_hotBkt,
roll_hotBkt, chillOrFreeze, update_checksums, fork_recovermetadata,
rebuild_metadata, update_bktManifest, service_volumes, service_maxSizes,
service_externProc | timechart minspan=30s sum(*) AS *