We saw a spike in the memory usage in one of the cluster search heads. This spike stayed for around 12 hours. When looking and comparing splunkd.log from all search heads, the impacted search head had something different. The warning in splunkd.log looks something like this:
Spent 10777ms reaping search artifacts in /opt/splunk/var/run/splunk/dispatch
Can anyone help me find out if the above would cause an excessive use of memory?
I ran across the same error. I believe it is related to an incomplete (timed out) bundle push.
Error
cluster-master : (/opt/splunk/var/log/splunk)
splunk $ grep -i bundle splunkd.log
04-27-2018 16:05:30.747 +0000 WARN PeriodicReapingTimeout - Spent 18915ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers
04-27-2018 19:03:23.439 +0000 WARN PeriodicReapingTimeout - Spent 11606ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers
04-27-2018 19:03:56.354 +0000 WARN PeriodicReapingTimeout - Spent 14195ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers
Doing the following (on the Cluster Master) resolved it:
Restart CM
splunk enable maintenance-mode
(Optional: splunk show maintenance-mode)
splunk restart
splunk disable maintenance-mode
Rolling Restart & Confirm
splunk rolling-restart cluster-peers
Wait 30 mins. (Time depends on the amount of indexers - this environment has 18)
splunk show cluster-bundle-status
Bundle Push
If the peers are not all displaying the same active bundle, do a bundle push.
splunk apply cluster-bundle
(Wait 30 mins)
splunk show cluster-bundle-status
I'm seeing the same message, however; it is appearing on my IDX cluster peers. It corresponds to a spike in CPU processing on the affected node. Seen in splunkd.log:
WARN PeriodicReapingTimeout - Spent 57296ms reaping search artifacts in ./var/run/splunk/dispatch
WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
bucket replication errors follow as peers try to stream to the affected node...
What might be causing the Timeout?
The message indicates that Splunk took 10.777 seconds removing expired search artifacts from the dispatch directory. I suspect that this warning message is more of a symptom than a cause. But it's hard to say with the information at hand.