Splunk cluster hardware planning

jpillai

We are planning to upgrade our Splunk hardware. We currently have below(multisite indexer cluster with independant search head clusters) and we are facing problems with low cpu count and high disk latency(we currently have HDDs). We primarily index data through HEC.

Type	Site	Number of nodes	CPU p/v (per node)	memory GB (per node)
SH cluster	1	4	16/32	128
Indexer cluster	1	11	4/8	64
Indexer manager/License master	1	1	16/32	128
SH cluster	2	4	16/32	128
Indexer cluster	2	11	4/8	64
Indexer manager/License master	2	1	16/32	128

Daily indexing/license usage 400-450GB which may grow further in near future

Search concurrency example for one instance from 4 node SH cluster

We are trying to come up with the best hardware configuration that can support such load.

Looking at Splunk recommended settings, we have comeup with below config. Can someone shed more light on if this is an optimal config and also advise on the number of SH machines and indexer machines needed with such new hardware

Site1: 3 node SH clusters, 7 node idx cluster

Site2: As we are using site2 for searching and indexing only during unavailability of site1, may be it can be smaller?

Role	CPU (p/v)	Memory
Indexer	24/48	64G
Non indexer	32/64	64G

gcusello

Hi @jpillai ,

two main things:

4/8 CPUs are very few for Indexers that should have at least 12 CPUs each one (if you don't have ES or ITSI).

You should analyze your requirements, with special attention to especially input next growth and the number of scheduled searches and concurrent users, because usually it's used one IDX every 200 GB indexed (less if you have ES or ITSI), so you have too many IDXs.

In addition you should analyze the performances of your disks (storage and system disks) to find the correct number of IDXs, because you need at least 800 IOPS better if more!

About configurations, SHs usually require more CPUs than IDXs, So I'd use (if you don't have ES or ITSI):

SH and IDX: 24/48 CPUs 64 GB RAM,
HF, CM, SHC-D, MC and DS: 12/24 CPUs 64 GB RAM.

About the secondary site, as also @dural_yyz said, the secondary site, in the normal activity) is mainly used for the data replication, but you should analyze also the worst case, so I'd use the same configuration of the main site.

Then, the Cluster Manager isn't required so performant and it must be only one in the cluster.

In other words, you can have only one CM because the cluster continue to run also if the CM is down, eventually having a silent copy to turn on if the Primary Site down is longer that predicted.

At least, I don't see in your infrastructure SHC-Deployer, Monitoring Console and Deployment Server for which you can apply the same considerations of the Cluster Manager.

Ciao.

Giuseppe

dural_yyz

The way that hot/warm/cold buckets along with bucket replication works it is in your best interest to make site 1 and site 2 indexing tier identical. Someone with advanced on prem admin experience would be able to size this but storage becomes you biggest concern with unaligned resources.

If you have some sort of business or budget constraints then I get why you would have unaligned sites - however personally I would very strongly suggest that both sites be identical compute and storage capacity at the indexing tier.

Your individual indexer CPU count will determine how many concurrent searches can be run. The compute power of your new machines appears acceptable from the minimal information available. Keep an eye on skipped searches to confirm - the internal logs will indicate a skip reason. Ideally SH and IDX should keep similar if not exact same CPU cores.

jpillai

Yeah budget is a concern. Given the fact that the secondary site will only be used during a site1 failure, most of the hardware will just be sitting there without much activity except for may be indexers doing some replication. So I am trying to see how we can minimize hardware at site2. We probably be using site2 for indexing and searching for may be few hours over a period of months when site1 is down or under maintenance.

Splunk cluster hardware planning

capacity planning

indexer clustering

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

Preparing your Splunk Environment for OpenSSL3

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector