I see the error "Too many search jobs found in the dispatch directory error" many time. I know to clean the directory using clean-dispatch command and I could see lots of answers on how to clean the dispatch directory.
But I couldn't find the answer why the old jobs are not getting deleted.
I could see the many jobs older than 1 day. We have not changed any default settings job's ttl. The issue is also not consistent.
Any root cause why the jobs older than 1 day are not deleted? I think the jobs were suppose to delete for 10m.
Job artifact retention is controlled by the "dispatch.ttl" setting.
https://docs.splunk.com/Documentation/Splunk/7.3.0/Admin/Savedsearchesconf
Indicates the time to live (ttl), in seconds, for the artifacts of the scheduled search, if no actions are triggered.
If an action is triggered, then it is based upon the configuration for that action. For example if we alert our DevOps group, we want the job to be retained for a minimum of three days, to allow for low-priority job review after a weekend.
You checked below setting?
alert.expires =
* Sets the period of time to show the alert in the dashboard. Use [number][time-unit]
to specify a time.
* For example: 60 = 60 seconds, 1m = 1 minute, 1h = 60 minutes = 1 hour etc
* Defaults to 24h.
* This property is valid until splunkd restarts. Restart clears the listing of
triggered alerts.
The property to be set for the job (in savedsearches.conf) is dispatch.ttl and which Defaults to 2p (that is, 2 x the period of the scheduled search). So based on time range selected for the search, its expiration is calculated. See documentation for more details.
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Savedsearchesconf
Thanks for the response.
I agree with you for schedule search ttl.
But, in my case I can see the jobs in dispatch directory which are older than 2-4 days even though those are scheduled to run for every 1 min or 2 min.
Also, I see the jobs id older than 2-4 days exists in dispatch dir for the searches ran manually .
I checked the splunkd.log file, there is no information about failure in deleteing these jobs.
So I'm wondering whether the dispatcher is trying to delete or not.
If the scheduled job is an alert, the retention time is also longer.
What does the Job manger tell you? (if you can access, since it wil break if there are to many jobs).
We notice when de SH's are very busy, the dispatch reaper will fail/doens't have time to delete everything.