Splunk Cloud Platform

Getting "Searches Delayed" warning in the splunk ES (cloud)search head

man03359
Communicator

Hi

Hi Team,

I am getting the below error message on my splunk ES search head. Is there any troubleshooting I can perform on the splunk web to correct this. Please help.
PS. I don't have access to the backend.

 

delayed search.PNG

Labels (2)
0 Karma

man03359
Communicator

Hi,

The issue has re-occurred. I modified few of the scheduled searches running on All time, staggered the cron etc. and it helped for a while.
From past few days, the error for the delayed search has increased upto 15000(+).

Could you please me resolve this permanently 😞

man03359_0-1732111611337.png

 



Also is there any query I can use to find out which all searches are getting delayed.
I am using this one-

 

index= _internal sourcetype=scheduler savedsearch_name=* status=skipped
| stats count BY savedsearch_name, app, user, reason
| sort -count

 

 

Can someone please help me out with this.
@PickleRick @richgalloway 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That query finds *skipped* searches, not delayed ones.  A delayed search runs late, but still runs, as opposed to a skipped search which does not run at all (at that time).

index= _internal sourcetype=scheduler savedsearch_name=* status=deferred
| stats count BY savedsearch_name, app, user, reason
| sort -count
---
If this reply helps you, Karma would be appreciated.

PickleRick
SplunkTrust
SplunkTrust

As I said before - these are your searches, your data and your environment. You have to check what searches you have, which ones of them are executed with what frequency and how long they take to run.

It's not something that can be automated. It's a tedious manual work to dig into those searches and decide whether they are needed, whether they need to run that often or on such long time range.

It can easily happen if you don't manage your environment strictly enough, don't have a well-defined process for configuring your searches and if your users have too "loose" permissions and can create schedules searches on a whim (especially if they can't write them effectively).

0 Karma

dural_yyz
Motivator

Scheduled "All Time" searches are typically a no-fly zone and you should look to see what your average or max run time is for those searches.  My general rule of thumb is that any search completion time should be less than 50% of the scheduled reoccurrence.

ie. Run Time less than 2 mins 30 seconds if schedule reoccurrence is 5 mins

If you absolutely must and I highly doubt there is any good reason that you need an "All Time" search try converting to a TSTATs search which processes much faster.  Even then I would stay away from that since "All Time" can be a significant drain and occupy resources for long periods of time.

Can you get away with putting daily results into a summary index, searching "All Time" on the summary index is far superior then searching raw data events.

richgalloway
SplunkTrust
SplunkTrust

You have too many searches try to run at the same time.  That means some searches have to wait (are delayed) until a search slot becomes available.  Use the Scheduled Searches dashboard in the Cloud Monitoring Console to see which times have the most delays and reschedule some of the searches that run at those times.

---
If this reply helps you, Karma would be appreciated.

man03359
Communicator

Hi @richgalloway ,

Apologies, this might be silly question but I am fairly new to Splunk.

I want to understand, is this delayed error message because of only scheduled searches, or ad-hoc searches also contributes to the error.

I have few scheduled searches running on "All time" , this could be the cause of delayed search?
Should I reduce the timeframe of these searches.

Also, there are many schedules searches all running at a cron of every 5 mins, do I need to change them as well.

 

Thanks in advance.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, both yes and no.

No, because the message only indicates that schedules searches have been delayed (ad-hoc searches have highest priority and unless you have many concurrent users and very low-spec environment are usually properly run). Yes, because ad-hoc search activity influences how many scheduled searches can be spawned.

And yes, all-time searches are very rarely a good idea. At least on raw data.

Also even if you have many searches that are supposed to be running every 5 minutes, you can often "spread" them over those 5 minutes so that some of them start at 0,5,10 and so on, some on 1,6,11... some on 2,7,12... You get the drift.

man03359
Communicator

Hi @PickleRick @richgalloway 

My number of delayed search has increased upto 5000plus. I did some investigation and using this command-
index=_internal sourcetype=scheduler savedsearch_name=* status=skipped | stats count by reason
I see the error "The maximum number of concurrent historical scheduled searches on this cluster has been reached" has 2000 plus count.

man03359_0-1729796629505.png

Two solution to fix this that I have understood is-

1. Staggering the searches that are causing the error by modifying the cron schedule and change the frequency.
2. to increase the search concurrency limit under limits.conf
(pls feel free to correct if I am wrong)
Since I am on splunk cloud, I understand I don't have access to limits.conf.

What I want to ask is I see an option under Settings>Server Settings> Search Preference>Relative concurrency limit for scheduled searches which is set as 60 for my system.

Will increasing this setting help, if yes, to what value is it safe to increase.

Please help, I am stuck in this problem from some days 😞

man03359_1-1729798276499.png

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Usually (as always, it's a general rule of thumb; impossible to say without a detailed knowledge of your environment and data; YMMV and all the standard disclaimers) fiddling with search concurrency is not the way to go. You can't get more computing power to run your searches that you have raw performance in your hardware. So even if you raise the concurrency splunk will be able to spawn more processes with searches but they will starve each other of resources because there's only so much iron underneath to use.

So check what is eating up your resources, disable unneeded searches, optimize the needed ones, teach your users to write effective searches and so on.

Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...