What will happen to our index cluster if we add an...

bjarnedein · ‎11-06-2018

Hi Guys,

Maybe a bit of a challenging question, but how "intelligent" is the Splunk clusters really?

Say you have an Index Cluster with 10* servers already running each with 12 core CPU's, and we need more cores in the duster to deal with the raising demand for ingesting even more events coming in.

All hosts(Linux) are virtual on VMWare.

What will happen to the Index Cluster if we add another 5 Index Servers to the existing cluster — each with less cores (6 each)?

In other words, even though it might not be the most optimal solution, will the Index Cluster still benefit from adding more servers with less cores each (compare to existing)?

If it will benefit, are there any tuning and/or configs that will help the Cluster to perform most optimal with divergent servers in it?

PS: The reason for asking is that, right now, it’s much faster to get new servers with 6 cores.

I'd be most happy to get some input on this subject, and, in general, hear a bit more about how "intelligent" and "flexible" the different Splunk instances are in dealing with divergence in capacity within clusters (indexer and Search Heads).

Best Regards,

Bjarne Dein

skalliger · ‎11-06-2018

Hi bjarnedein,

now that's really difficult to answer without knowing your cluster configuration and your backend at all. This totally depends on different factors. Your cluster will be as intelligent as you configured it to be.

You're saying you have to ingest more data. How much are we talking about per day?
What is your VM backend looking like? Do you have enough IOPS for ingesting that much data? If you simply add more servers which are attached to the same SAN you might run into a bottleneck sooner than later. Depending on your existing servers, you can already tune your config. Take a look at manage pipeline sets. If you're running all virtualized I'd bet the IOPS become a bigger problem than your cores will.
Also depending on your setup, you can tune the load balancing settings (e.g. shorter frequency) and bucket rolls to be more efficient.
Your cluster master is basically your brain of the indexer cluster. More indexers means more computing time, thus more time for the existing servers to do other things.
More indexers doesn't necessarily mean a higher replication factor. What SF and RF are you using? The master tracks which indexers do have a copy of your buckets and thus you can distribute your data better the more indexers you got (with an RF of 3 and 9 indexers only 2 indexers would have a copy of the original bucket).
If your servers (both Splunk and sending ones) are located in different data centres, you might even want to think about splitting your cluster into a multisite cluster with site awareness to reduce network load between data centres.
Taking a look at the Monitoring Console (formerly known as DMC) is always a good start to get an overview of your indexer (or search head) cluster and identify possible problems or bottlenecks.

Edit: fixed mistakable bucket explanation

Skalli

bjarnedein · ‎11-06-2018

Hi Skalli,

Thanks for your response.

Approx 4TB/day
We have seen IOPS issues even though we're running all on SSD SAN disks. And we have experts looking into it.
We're aware of this as well, but this is not I'm after at all in this question:-)
Yes sure, the CM is the "brain", but exactly how smart is it? My question here is all about exactly that!
SF:2 RF:3
Multisite is in the tube soon, but we have another capacity need before we get there.
The DMC is great, but again can't answer what I'm looking for:-)

Let me be more precise here: What will happen to our Index Cluster if we add 12 more servers each with 6 cores (In total 72 more cores) to our already existing 16 indexers with each 12 cores (192 cores in total)?
Will we gain 50% more indexing and/or search capacity?
How will the CM handle the divergence in cores/host?
Do we HAVE TO have total equal hosts in the Index cluster (I know it's recommended, but recommended it not always possible to get)?

This IS a Splunk infrastructure challenge, and I'd like/need to know how flexible /smart the system is.

Best Regards,

Bjarne Dein

skalliger · ‎11-06-2018

Hi Bjarne,

so you already got a pretty big setup. This is not easy to scale anymore and I would suggest contacting Splunk's Professional Service for further assistance. As far as I am aware, it's not a good idea to have several high-end indexers (btw, 12 core indexers are not) and on the other side way smaller ones.

Your question was how you can get more events in per second, which does not necessarily mean deploying more indexers. That's why I suggested to take a look into tuning and the Monitoring Console. Why? Because there are several steps that are taken when an event comes in, which can slow your event processing down. Your events go through multiple queues - parsing, aggregation, typing, indexing. The monitoring console might actually show where you can further debug your cluster. More cores for the existing indexers might even help as they will increase the possible threads on an indexer. Just an example. You get what I mean.

So, no, I would not expect a rise up to 50% of indexing rate. Especially if the same SAN with the same disk controllers behind. I would try to increase the CPU cores and the RAM of your existing indexers first, before bringing new ones into the cluster and look for any blocked queues taking too long to process events.

Because having 12 cores only meets the requirements of a reference host whereas giving all of your indexers more cores and more RAM (depending on your overall load) as well as the mentioned IOPS increase would be my first step to take, along with more parallel pipes if needed. And having around 1500 IOPS per indexer which shouldn't be a problem with an SSD-supported SAN. That's my personal opinion though.

Skalli

bjarnedein · ‎11-06-2018

Hi Skalli,

I valid your points, but there are more constraints here - unfortunate.
We're over capacity on our Index cluster, and we can't get more core to the existing servers, that's just the way it is.
So I'm frankly looking for alternatives to how we can add more capacity while keeping the current servers.

Bjarne

What will happen to our index cluster if we add another 5 Index Servers to the existing cluster?

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024