Monitoring Splunk

What is meant by "Spent X ms reaping search artifacts"?

anwarmian
Communicator

We saw a spike in the memory usage in one of the cluster search heads. This spike stayed for around 12 hours. When looking and comparing splunkd.log from all search heads, the impacted search head had something different. The warning in splunkd.log looks something like this:
Spent 10777ms reaping search artifacts in /opt/splunk/var/run/splunk/dispatch
Can anyone help me find out if the above would cause an excessive use of memory?

scorrie_splunk
Splunk Employee
Splunk Employee

I ran across the same error. I believe it is related to an incomplete (timed out) bundle push.

Error
cluster-master : (/opt/splunk/var/log/splunk)
splunk $ grep -i bundle splunkd.log
04-27-2018 16:05:30.747 +0000 WARN PeriodicReapingTimeout - Spent 18915ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers
04-27-2018 19:03:23.439 +0000 WARN PeriodicReapingTimeout - Spent 11606ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers
04-27-2018 19:03:56.354 +0000 WARN PeriodicReapingTimeout - Spent 14195ms reaping replicated bundles in $SPLUNK_HOME/var/run/searchpeers

Doing the following (on the Cluster Master) resolved it:

Restart CM
splunk enable maintenance-mode
(Optional: splunk show maintenance-mode)
splunk restart
splunk disable maintenance-mode

Rolling Restart & Confirm
splunk rolling-restart cluster-peers

Wait 30 mins. (Time depends on the amount of indexers - this environment has 18)
splunk show cluster-bundle-status

Bundle Push
If the peers are not all displaying the same active bundle, do a bundle push.

splunk apply cluster-bundle
(Wait 30 mins)
splunk show cluster-bundle-status

0 Karma

dctopper
Explorer

I'm seeing the same message, however; it is appearing on my IDX cluster peers. It corresponds to a spike in CPU processing on the affected node. Seen in splunkd.log:

WARN PeriodicReapingTimeout - Spent 57296ms reaping search artifacts in ./var/run/splunk/dispatch
WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds

bucket replication errors follow as peers try to stream to the affected node...

What might be causing the Timeout?

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

The message indicates that Splunk took 10.777 seconds removing expired search artifacts from the dispatch directory. I suspect that this warning message is more of a symptom than a cause. But it's hard to say with the information at hand.

0 Karma
Get Updates on the Splunk Community!

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...

Alerting Best Practices: How to Create Good Detectors

At their best, detectors and the alerts they trigger notify teams when applications aren’t performing as ...

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Hey Splunky people! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2408. In this ...