Archive
Highlighted

Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

Currently I have two search head clusters, one has a smaller number of users and therefore less scheduled searches, latency is generally around 3 seconds which is great!

However my other cluster which has 4 nodes can go as high as 30+ seconds of latency during busy periods.
Since the default Splunk "run every 5 minutes" or "run every hour" defaults to been on the hour, the problem usually occurs around times such as 10 o'clock, 11 o'clock et cetera.

The search heads have a limited number of CPU's however they are only utilising 20-30% CPU on the Linux machines.
There are a large (and growing) number of alerts that will be run by these search heads.

I've very carefully checked and none of the searches are delayed due to quota enforcement that I can see.
In terms of configuration changes I have tried making the captain an ad-hoc search head, and I've increased the number of non-ad hoc search heads to 4 instead of 3 and there has been a very slight reduction in latency.

No search skipping is occurring so it's just a latency issue when executing, is there anything I can tune?

I am running Splunk 6.5.2 and looking at 6.6.1 now...
Happy to award the 25 points to anyone who can help in the tuning process and help resolve this!

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

Hello there,
I found out lately, that when saving an alert or scheduled search, Splunk's does not set schedule_window is default
savedsaerches.conf.spec:

schedule_window = <unsigned int> | auto
* When schedule_window is non-zero, it indicates to the scheduler that the
  search does not require a precise start time. This gives the scheduler
  greater flexibility when it prioritizes searches.
* When schedule_window is set to an integer greater than 0, it specifies the
  "window" of time (in minutes) a search may start within.
  + The schedule_window must be shorter than the period of the search.
  + Schedule windows are not recommended for searches that run every minute.
* When set to 0, there is no schedule window. The scheduler starts the search
  as close to its scheduled time as possible.
* When set to "auto," the scheduler calculates the schedule_window value
  automatically.
  + For more information about this calculation, see the search scheduler
    documentation.
* Defaults to 0 for searches that are owned by users with the
  edit_search_schedule_window capability. For such searches, this value can be
  changed.
* Defaults to "auto" for searches that are owned by users that do not have the
  edit_search_window capability. For such searches, this setting cannot be
  changed.
* A non-zero schedule_window is mutually exclusive with a non-default
  schedule_priority (see schedule_priority for details).

performed the following search to check the status on this config:

| rest /servicesNS/-/-/saved/searches 
| search is_scheduled=1 
| table title eai:acl.app eai:acl.owner cron_schedule next_scheduled_time schedule_window search

to my surprise, all searches had value of either "default" or "0" under the schedulewindow field
maybe the reason is that when you save a search, the following is what pops right away and may be a little confusing, see screenshot. in any case, changing all schedule
window to "auto" immediately reduced skipped searches amount and latency on the environment i was working at.
alt text
hope it helps

Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

I provided a vote as it's a good answer, not quite what I'm looking for.
I suspect there is a setting I can use to tune the scheduler threadpool or similar, or at least there should be.

Thankyou for your time and the tip!

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

thanks!
will be interesting to know of such setting

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

Contributor

so what did you finally end up doing?

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

Path Finder

Assuming that you are not I/O bound, which might be something to look into and double check, as well as your Indexers are able to keep up which is often the source of latency.

You could up the number of executor_workers in your server.conf

[shclustering]
executor_workers =
* Number of threads that can be used by the search head clustering
threadpool.
* Defaults to 10. A value of 0 will be interpreted as 1.

But again I would evaluate if your indexers are not able to keep up with the additional calls during your more heavily loaded windows. That is often where I see imparted latency is from the indexing layer, not the SHC.

Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

I actually saw this setting but I'm unclear on what the " Number of threads that can be used by the search head clustering" is.

Is this the pool that relates to scheduling?

If so I'll accept the answer and request a clarification from the documentation team.

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

Path Finder

It is the number of threads that are in the threadpool, controlling the number of threads that are able to be addressed by the scheduler

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

I'm going to put a suggestion to the documentation team that the above wording is used or something similar so it's more clear...

Thankyou

0 Karma
Highlighted

Re: Splunk scheduler - how can I reduce latency? What can be tuned beyond adding more CPU power?

SplunkTrust
SplunkTrust

Suggestion done, FYI that has managed to drop my scheduler latency to between 20 and 35 seconds, however upon later review that might be coincidence, I will need to re-measure after a longer period of time.

Any other settings I can tweak?
I did notice there are 2-4 simultaneous scheduled searches running per second per search head so I'm unsure if that can be changed.

FYI the indexers average 36% CPU around the time where the latency is high on the scheduler. Note that my issue is not the time the searches take to run the problem is how quickly the searches are getting kicked off by the scheduler.

0 Karma