i'm seeing this message after firing up backfill data in Splunk Deployment Monitor:
Too many search jobs found in the dispatch directory (found=3692, warning level=2000). This could negatively impact Splunk's performance, consider removing some of the old search jobs.
Is this dangerous? Can I manually clear out var/run/splunk/dispatch when it's done?
If you see this error you can manually clear out any jobs in the dispatch folder. I would probably recommend you start with the older ones. The only downside is that the artifacts of those saved searches which populated the summary index won't be around once you clear the dispatch directory. Since the data you are after is in the summary index, it doesn't matter. Any artifacts you eliminate will be regenerated at the next scheduled run time for a particular saved search.
Yes Dispatch directory can be cleared it wont cause any issues
might be "real time alerts" also a cause to produce many dispatch files.
for my case, one real time alert was triggering 3-5 times in a second.
once we changed it to schedule alert, problem was solved.
Real-time alerts spammed our dispatch folder and ended up breaking the entire Splunk interface. Cleared /var/run/splunk/dispatch and modded the real-time alerts and boom, fixed.
If anyone doesn't know cron schedules, setting to "* * * * *" should fix this problem. It's alerting every minute instead of real-time.
My user base is constantly doing this... we seem to have to do a quarterly sweep of users building alerts out f real-time searches which then flood the dispatch directory over time...
I confirm an alert storm that causes saturation splunk server, Once the alert removes the problem is set,
to add answer to the ellen,
some times, it won't delete because of <old_dispatch> directory what we created to move dispatch jobs may full. so create one more and move them. it works.
Our issue looks like a bug in jobs. Certain jobs were not showing as completed and therefore hung around on disk until they expired. Issue was reported to Splunk Support and no word as to whether they will mark this as a bug or not. For now we have deleted all these old scheduled searches (probably some configuration issue after upgrade) and recreated them by hand.
To add to jbsplunk's answer:
The number of directories relating to the search artifacts in the Dispatch directory can potentially affect search performance since we have to scan each of the directories to determine if the artifacts are present or not.
The UI warning message about Dispatch directories being > 2K is new to 4.2.3. There isn’t any hard limit of 2K that is impacting anything as the warning is what was implemented as a best practice to start to review search jobs in general eg. is there a scheduled search that has an excessively long ttl?
You can change that 2K to a higher number so the warning takes longer to display via the
[search] stanza
in your
$SPLUNK_HOME/etc/system/local/limits.conf
From the $SPLUNK_HOME/etc/system/README/limits.conf.spec:
[search]
dispatch_dir_warning_size = <int>
* The number of jobs in the dispatch directory when to issue a bulletin message warning that performance could be impacted
* Defaults to 2000
The appropriate number of Dispatch directories that should be set before the performance is impacted would vary per environment as it would depend on variables such as the volume and type of searches being run, what are the ttl etc.
If you find subdirectories in $SPLUNK_HOME/var/run/splunk/dispatch beyond 24hrs of their last modtime AND the subdirectory does not contain BOTH info.csv and status.csv files, that is considered a failed search job and that subdirectory can be safely removed. We expect this should be automatically performed by the dispatch reaper starting in 4.3
In the meanwhile outside of your own cron/scripting, there is an option where you can move subdirectories based on a timeline you have determined is acceptable out of the Dispatch directory .
Below is the usage information.
Use this command to move jobs whose last modification time is earlier than the specified time from the dispatch directory to the specified destination directory.
usage: $SPLUNK_HOME/bin/splunk cmd splunkd clean-dispatch {destination directory where to move jobs} {latest job mod time}
The destination directory must be on the same partition/filesystem as the dispatch directory.
example: splunk cmd splunkd clean-dispatch /opt/splunk/old-dispatch-jobs/ -1month
example: splunk cmd splunkd clean-dispatch /opt/splunk/old-dispatch-jobs/ -10d@d
example: splunk cmd splunkd clean-dispatch /opt/splunk/old-dispatch-jobs/ 2011-06-01T12:34:56.000-07:00
There are future enhancements to manage search job cleanup.
I updated the answer; for the current limitations of clean-dispatch, destdir must be on the same file system as the dispatch dir.
I had the same problem. I ran the clean-dispatch command and for a few results got the following:
Could not move /opt/splunk/var/run/splunk/dispatch/scheduler_cmerchantsearch_SW50ZXJhY3QgU3VzcGljaW91cyBTb3VyY2UgSVAgQWxlcnQ_at_1339130700_2aeaae3b6e48c487 to /space/splunktmp/schedulercmerchant_search_SW50ZXJhY3QgU3VzcGljaW91cyBTb3VyY2UgSVAgQWxlcnQ_at_1339130700_2aeaae3b6e48c487. Invalid cross-device link
To be honest, I would expect splunk to clean these out. It seems like it should be an automatic house keeping task. We are starting to see this quite regularly in our deployment.
Conversely, is there some way to raise the threshold if you actually running that many searches?
If you see this error you can manually clear out any jobs in the dispatch folder. I would probably recommend you start with the older ones. The only downside is that the artifacts of those saved searches which populated the summary index won't be around once you clear the dispatch directory. Since the data you are after is in the summary index, it doesn't matter. Any artifacts you eliminate will be regenerated at the next scheduled run time for a particular saved search.
are there any latest updates as to how to clean up those extra jobs running and giving errors ?
I ran clean-dispatch command and it moves them to old-dispatch directory, do I really need to keep them or can I just delete them?
great question ! We have had Splunk for 2 years, I have never had anyone ask me for search results from the old-dispatch-jobs directory. So in my mind that means delete them say after a week. : - )