Archive

skipped savedsearches

Communicator

Hello,

I have a case opened for this - but it seems that this forum can be quicker at times...

I run between 100-200 savedsearches on a one minute interval on each one of my indexers. These searches are used for operational monitoring and generate a trap on an alert match. In v3.4 these ran without a problem with very little skips. In 4.1 I notice in some cases on busy indexers we're skipping at about 50%.

These are historical searches - not realtime. As a workaround we've increased our interval to 2 or 3 minutes and this has helped somewhat, but I hear that there's also max_searches_per_cpu setting in limits.conf that can make a difference.

Currently this value is set to 4 - and if I understand the documentation correctly, with a system with 8 virtual cpus we should be able to run 9 concurrent searches. (4 x 8 + 4)*.25 = 9. This exactly matches what I see on the search "status->scheduler activity->overview" dashboard for Avg Running Searches - we're topped out at 9 running searches.

However, after trying to increase the max_searches_per_cpu to 8 and restarting splunk I still see our Avg Running Searches at 9, and the same amount of skips. Is this the correct setting to change for scheduled search performance? Am I looking at the correct metrics to see if my performance has improved? Any other ideas to lessen the skips?

Thank you

Tags (1)
1 Solution

Splunk Employee
Splunk Employee

It's probably better to increase the [scheduler]max_searches_perc to something more than 25%, but I probably either you've simply reached the physical ability of your hardware to run the searches, or you've configured max_searches_per_cpu incorrectly, e.g., you didn't put it under a [search] stanza in limits.conf.

(Because 4.x runs searches in separate processes, unlike 3.x, there is more overhead for kicking off each search. While overall and larger searches perform much better in 4.x, if your searches are very tiny, and they appear to be, it's possible that 4.x is using up more resources executing all those searches than 3.x did, because of the cumulative overhead of launching 100-200 new search processes every minute.)

View solution in original post

Splunk Employee
Splunk Employee

Super Champion

Just another thought. I found when we upgrade from 3.4.x to 4.0 that we could actually reduce the number of saved searches. Mostly this was due to the search language improvements made in 4.x. In particular, I found the revamped eval logic and overhauled transaction commands to be much more useful in 4.x than in 3.x.

In one specific case where we were doing some complex FTP transaction analysis, which was used to populate our summary index, we were able to take 4 separate saved searches and combine all of the logic down into a single search. And as a bonus, the resulting search was actually faster than the ones in 3.x On top of that, we were able to add capture some additional business scenarios within the data without too much additional work. I was quite impressed.

Anyways, this isn't a quick fix or applicable in all situations, but I thought it was worth pointing out that due to a grealy improved search language, you may find some consolidation options here or there.

Happy Splunking!

0 Karma

Splunk Employee
Splunk Employee

It's probably better to increase the [scheduler]max_searches_perc to something more than 25%, but I probably either you've simply reached the physical ability of your hardware to run the searches, or you've configured max_searches_per_cpu incorrectly, e.g., you didn't put it under a [search] stanza in limits.conf.

(Because 4.x runs searches in separate processes, unlike 3.x, there is more overhead for kicking off each search. While overall and larger searches perform much better in 4.x, if your searches are very tiny, and they appear to be, it's possible that 4.x is using up more resources executing all those searches than 3.x did, because of the cumulative overhead of launching 100-200 new search processes every minute.)

View solution in original post

Communicator

You were right - the problem was that I didn't have the search stanza in my modified copy of the limits.conf. Adjusting the setting from 4 to 6 seems to have made a big difference now - I'm seeing no skips and execution lag time has dropped as well. As long as ad hoc search performance is ok - I think the problem is resolved.

0 Karma