Hi
Hi Team,
I am getting the below error message on my splunk ES search head. Is there any troubleshooting I can perform on the splunk web to correct this. Please help.
PS. I don't have access to the backend.
Hi,
The issue has re-occurred. I modified few of the scheduled searches running on All time, staggered the cron etc. and it helped for a while.
From past few days, the error for the delayed search has increased upto 15000(+).
Could you please me resolve this permanently 😞
Also is there any query I can use to find out which all searches are getting delayed.
I am using this one-
index= _internal sourcetype=scheduler savedsearch_name=* status=skipped
| stats count BY savedsearch_name, app, user, reason
| sort -count
Can someone please help me out with this.
@PickleRick @richgalloway
That query finds *skipped* searches, not delayed ones. A delayed search runs late, but still runs, as opposed to a skipped search which does not run at all (at that time).
index= _internal sourcetype=scheduler savedsearch_name=* status=deferred
| stats count BY savedsearch_name, app, user, reason
| sort -count
As I said before - these are your searches, your data and your environment. You have to check what searches you have, which ones of them are executed with what frequency and how long they take to run.
It's not something that can be automated. It's a tedious manual work to dig into those searches and decide whether they are needed, whether they need to run that often or on such long time range.
It can easily happen if you don't manage your environment strictly enough, don't have a well-defined process for configuring your searches and if your users have too "loose" permissions and can create schedules searches on a whim (especially if they can't write them effectively).
Scheduled "All Time" searches are typically a no-fly zone and you should look to see what your average or max run time is for those searches. My general rule of thumb is that any search completion time should be less than 50% of the scheduled reoccurrence.
ie. Run Time less than 2 mins 30 seconds if schedule reoccurrence is 5 mins
If you absolutely must and I highly doubt there is any good reason that you need an "All Time" search try converting to a TSTATs search which processes much faster. Even then I would stay away from that since "All Time" can be a significant drain and occupy resources for long periods of time.
Can you get away with putting daily results into a summary index, searching "All Time" on the summary index is far superior then searching raw data events.
You have too many searches try to run at the same time. That means some searches have to wait (are delayed) until a search slot becomes available. Use the Scheduled Searches dashboard in the Cloud Monitoring Console to see which times have the most delays and reschedule some of the searches that run at those times.
Hi @richgalloway ,
Apologies, this might be silly question but I am fairly new to Splunk.
I want to understand, is this delayed error message because of only scheduled searches, or ad-hoc searches also contributes to the error.
I have few scheduled searches running on "All time" , this could be the cause of delayed search?
Should I reduce the timeframe of these searches.
Also, there are many schedules searches all running at a cron of every 5 mins, do I need to change them as well.
Thanks in advance.
Well, both yes and no.
No, because the message only indicates that schedules searches have been delayed (ad-hoc searches have highest priority and unless you have many concurrent users and very low-spec environment are usually properly run). Yes, because ad-hoc search activity influences how many scheduled searches can be spawned.
And yes, all-time searches are very rarely a good idea. At least on raw data.
Also even if you have many searches that are supposed to be running every 5 minutes, you can often "spread" them over those 5 minutes so that some of them start at 0,5,10 and so on, some on 1,6,11... some on 2,7,12... You get the drift.
My number of delayed search has increased upto 5000plus. I did some investigation and using this command-
index=_internal sourcetype=scheduler savedsearch_name=* status=skipped | stats count by reason
I see the error "The maximum number of concurrent historical scheduled searches on this cluster has been reached" has 2000 plus count.
Two solution to fix this that I have understood is-
1. Staggering the searches that are causing the error by modifying the cron schedule and change the frequency.
2. to increase the search concurrency limit under limits.conf
(pls feel free to correct if I am wrong)
Since I am on splunk cloud, I understand I don't have access to limits.conf.
What I want to ask is I see an option under Settings>Server Settings> Search Preference>Relative concurrency limit for scheduled searches which is set as 60 for my system.
Will increasing this setting help, if yes, to what value is it safe to increase.
Please help, I am stuck in this problem from some days 😞
Usually (as always, it's a general rule of thumb; impossible to say without a detailed knowledge of your environment and data; YMMV and all the standard disclaimers) fiddling with search concurrency is not the way to go. You can't get more computing power to run your searches that you have raw performance in your hardware. So even if you raise the concurrency splunk will be able to spawn more processes with searches but they will starve each other of resources because there's only so much iron underneath to use.
So check what is eating up your resources, disable unneeded searches, optimize the needed ones, teach your users to write effective searches and so on.