Monitoring Splunk

Saved search performance question

mookiie2005
Communicator

We have around 80 saved searches that run per minute on our search head. Each night we see the search dispatch times slide from running ontime to being dispatched 2 hours late by the end of the night before we have to restart the search head. We have one search head and two indexers. Our search head has 32 cores. We have the max_searches_per_cpu value set to the default of 4. Now we think we should be able to run 132 saved searches concurrently = 4+ 32 * 4 = 132. Is this math correct? How many search heads should be used if we are trying to run 80 saved searches per minute? Does anyone run a similar number of searches? How many search heads gives you adequate performance?

sowings
Splunk Employee
Splunk Employee

I'll comment that "Why so many searches every minute?" Do the people who are consuming these alerts actually act upon them within a minute? It's like when people understand the price tag associated with the fabled "five 9's (99.999%) uptime".

It might make more sense to run a handful at a time, on a five-minute schedule, with some at :01, some at :02, some at :03, repeating around the clock.

alacercogitatus
SplunkTrust
SplunkTrust

Do all 80 saved searches finish within 1 minute? If not, then you will start stacking searches onto the queue until it's soo bogged down you have to restart. That searchhead sounds beefy enough, how much RAM do you have on it? Why do you require 80 saved searches every minute? It feels as if some optimization could be done.

0 Karma

sowings
Splunk Employee
Splunk Employee

After running the search, you'll want to hit the little icon that looks like a stair-stepped column chart, to the left of the "Export" link in the results area. It should produce a column chart showing literally the "tall poles" for the long-running search. If you want to see it in tabular form, run:


index=_internal source=*scheduler.log earliest=@h | stats count, sum(run_time) AS runtime by savedsearch_name | sort - runtime

Change the "earliest" parameter as desired. The example shown is "back to the top of the hour." Count is number of runs, runtime is in seconds.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Based on your specs provided for searchheads and amount data processed, you're good in terms of hardware. I'd start by doing sowings recommendation, and back it off to 5 minutes minimum. 5 minutes notification is normally more than enough time and doesn't cause huge issues. most users don't even notice it's down that fast.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Concerning optimization strategies, first you need to identify the longest-running searches (using that scheduler.log query) and then I'd go for reducing the number of events scanned and looking for redundancies as a first step, there are many more.

Detailed optimization depends on each individual query, for that I'd redirect you to our sales department 🙂

0 Karma

mookiie2005
Communicator

index=_internal source=*scheduler.log | timechart sum(run_time) AS runtime by savedsearch_name - this search returned gibberish.

0 Karma

mookiie2005
Communicator

We have spread out our saved searches and the result is we are running 80 per minute. Lets move away from how many searches per minute and look more at how can we tune our search head to better utilize the resources available. we index about 30 GB of data per day give or take. When you say optimizing your saved searches can anyone point me to some documentation that describes this?

0 Karma

sowings
Splunk Employee
Splunk Employee

Consider evaluating the "tall pole" with a search like the one below, and optimize some of the searches themselves to improve the overall runtime.


index=_internal source=*scheduler.log | timechart sum(run_time) AS runtime by savedsearch_name

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Very true. mookie: how much are you indexing per day?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Concerning the "how many search heads" part of your question - it depends on the searches whether your performance is bottlenecked by the search head or the indexers. If they're data-heavy you might be maxing out your indexers providing data rather than the search head, in that case you may either see gains by optimizing your searches to require fewer events scanned or by adding more indexers to serve up the data faster.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

My first task would be to see if you can combine any of the searches into a smaller number of searches, and then increase the time between runs to be more than it takes the time of execution, if possible. So if you have 5 service owners, you may be able to create a single search for all of their services, and send 1 email or alert with all the relevant information. Also take a look at optimizing the searches, this is very important as you start to scale up in size.

0 Karma

mookiie2005
Communicator

We have 80 saved searches that run per minute because that is what has been requested by service owners to validate the enviornment. The searches are not very complex. Usually they finish very fast, but not all 80 are able to complete in the span of the minute. As you said this backs up the search heah and the searches are executed later and later than they were scheduled for. The search head has 32 GB of RAM.

0 Karma
Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...