Deployment Architecture

Search Head Performance verification for high load of users

pgadhari
Builder

We have a dashboard with 12 Panels from different sources, and each panel is powered by summary index. As of now, the dashboard takes approx. 17 secs to load all the panels. Actually, we have to roll out this dashboard across company with >1000 to 2000 users would be accessing it. Also, we are expecting the concurrency of 300 to 400 users, simultaneously accessing the dashboard once it is rolled out.

My each SH is 24 CPU/64 GB memory and each indexer is 16 CPU/64 GB.

I need to do a pro-active check on what parameters can be checked/set on Search Heads and Indexers to make sure, it should not create any performance issues, when these load of users start using it. We have 3 Search Heads and 4 Indexers in a clustered environment. I need recommendations, on what performance parameters should be checked before the roll-out - like number of concurrent searches for each user, increasing the historical searches, any indexer performance parameter settings.. etc. etc.. Please advise ?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

First, your architecture is under-powered. You have 72 SH CPUs, but only 64 indexer CPUs. That means your indexers don't have enough CPUs to run as many searches as your SHC can run. That doesn't include the resources needed to index incoming data. A rule of thumb is to have twice as many indexer CPUs as SH CPUs.

That said, you don't have enough SH CPUs to support 400 simultaneous searches. Even if only half of those users actually have the dashboard open, you're still 128 CPUs short. (Remember, each search has exclusive use of a CPU until it completes.)

To reduce the load caused by the dashboard, consider using a saved search. The saved search can run in the background at some interval and the dashboard will load the most recent results of the search. Not only are you vastly reducing your CPU costs, but now each user of the dashboard sees the same values.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

First, your architecture is under-powered. You have 72 SH CPUs, but only 64 indexer CPUs. That means your indexers don't have enough CPUs to run as many searches as your SHC can run. That doesn't include the resources needed to index incoming data. A rule of thumb is to have twice as many indexer CPUs as SH CPUs.

That said, you don't have enough SH CPUs to support 400 simultaneous searches. Even if only half of those users actually have the dashboard open, you're still 128 CPUs short. (Remember, each search has exclusive use of a CPU until it completes.)

To reduce the load caused by the dashboard, consider using a saved search. The saved search can run in the background at some interval and the dashboard will load the most recent results of the search. Not only are you vastly reducing your CPU costs, but now each user of the dashboard sees the same values.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pgadhari
Builder

Ok. Suppose if I add more CPUs to my SHs and Indexers. So, I am planning to add total of 32 CPUs to each 3 Search Heads and 4 indexers, that should help to cater some load which we are expecting, I think, right ?

Also, if I increase the concurrency parameters in limits.conf for all the search heads, then whether it will be helpful ?

############################################################################
# Concurrency
############################################################################
# This section contains settings for search concurrency limits.
# The total number of concurrent searches is
# base_max_searches + #cpus*max_searches_per_cpu

# The base number of concurrent searches.
base_max_searches = 6

# Max real-time searches = max_rt_search_multiplier x max historical searches.
max_rt_search_multiplier = 1

# The maximum number of concurrent searches per CPU.
max_searches_per_cpu = 1

I will increase, max_searches_per_cpu=4, so the calculations would be as follows. Please advise, if this calculations would work, or I should think of any other parameters ?

6 + 96*4 = 390 - This will be total number of concurrent searches that can be run ?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It's better to scale out (add servers) than up (add CPU). Your solution is is an improvement, but still only gives the indexing tier 33% more capacity than the SH tier. Do make sure your indexers have enough I/O throughput to handle the search load and indexing load.

Changing max_searches_per_cpu is an option, but it's one I would only use if the CPUs in all tiers are sufficiently idle to support the extra workload.

---
If this reply helps you, Karma would be appreciated.

pgadhari
Builder

In our environment, I can see that Search Heads are very much idle most of the time, hence I was thinking of changing the max_searches_per_cpu option. I think, at present, I will go with this option, and see how the servers are behaving. If any issues pop-up, I will consider adding to scale out.

Whether its ok to increase the max_searches_per_cpu from 4 to 8, just incase, i face issues in search performances ? also, is there any other parameter that can be considered in SHs or Indexers for performance improvement of Searches, instead of adding scale out - please advise ?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Increasing max_searches_per_cpu as an experiment is worth a try. If you have performance issues, reduce the setting.

Keep an eye on memory use. Additional concurrent searches will use more memory,.

---
If this reply helps you, Karma would be appreciated.

pgadhari
Builder

Ok sure. Thanks for you expert recommendations and also appreciate your prompt reply on this. I will keep the Servers under monitoring and keep an eye on Memory consumption too. I will accept your answer as of now, but in case if any issues come after roll out, would need your recommendations too. Thanks again.

0 Karma
Get Updates on the Splunk Community!

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Changes to Splunk Instructor-Led Training Completion Criteria

We’re excited to share an update to our instructor-led training program that enhances the learning experience ...