Deployment Architecture

Is it possible to run an indexer cluster with varying amounts of storage capacity and have it be workable?

kaufmanm
Communicator

We have an on premises Splunk infrastructure with indexers that have 9 TB of usable storage in a cluster. We are moving from 6 months retention to 1 year retention and need to double our storage. Is it possible to run a cluster with varying amounts of storage in our indexers and make use of it?

e.g. We could add new indexers with 30 TB of usable storage, but would the cluster be able to use any of the 21 TB beyond what we're using on the existing indexers? It's basically just cold storage of data older than six months. Any documentation that specifies whether or not this works?

0 Karma
1 Solution

s2_splunk
Splunk Employee
Splunk Employee

In a cluster, all indexers need to have the exact same configuration, deployed via the cluster master, so what you are trying to do is unfortunately not currently possible in a single cluster.
You could setup the new indexers as a separate, second cluster and configure that to take full advantage of your disk space, but your current indexers may still age out data sooner than you want to due to possibly not having enough space to keep a year's worth of data (depending on how many new ones you add, i.e. whether the per-indexer storage need drops to a number sufficient to keep a year) . You would also need to find a way to proportionally send data to each cluster. That gets really messy quickly...
So, either match existing storage capacity on the new indexers, or deploy new indexers with more disk space and over time upgrade storage on your existing nodes to match.

View solution in original post

sdvorak_splunk
Splunk Employee
Splunk Employee

The short answer is "no". Splunk does not have the ability today to balance buckets by available storage on an Indexer. You can check out the docs for this here: http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Rebalancethecluster

In the future, we are hoping to make cluster balancing more intelligent and do what you suggest, but I'm not sure when that feature will surface in the product. Instead, you will have to architect to more evenly add storage to your Indexers.

kaufmanm
Communicator

Appreciate the response. The next step for us once we double the storage is to move into AWS. Can we span a site in a multisite cluster between our datacenter and an AWS availability zone? e.g. We have three indexers in DC1, three indexers in DC2 in a multisite cluster where DC1 is a site and DC2 is a site. Can we add three indexers in AWS AZ1 and add them to the DC1 site within the multisite cluster, and the same for the other side? What kind of network bandwidth between the DC and AWS would that require, assuming most of 500 GB/day is generated in DC1 today? Then we could gradually add more indexers to AZ1 as we retire indexers in DC1 until we're fully in AWS?

0 Karma

sdvorak_splunk
Splunk Employee
Splunk Employee

kaufmanm,
Yes, you should be able to create a multisite cluster with on-prem and AWS (we have many customers doing this). What you can't/shouldn't do is create a single cluster with on-prem and AWS indexers in the same cluster (which I think is what you are alluding to). AWS should be a separate index cluster that can participate in the multisite cluster. But you would need enough indexers in AWS to handle the entire DC1 load...
DC1 = on-prem
AZ1 = AWS
Then you can start replicating DC1 to the AWS AZ1, once fully replicated, you could begin sending all new forwarded data to the AWS AZ. This would allow you to shutdown DC1 entirely at that point. The bandwidth required during this transition would be the necessary bandwidth for your forwarded data (500GB/day) plus the replication of the buckets until you shutdown DC1 (initially very high due to replicating existing buckets, but then would drop down to just new buckets, roughly 250GB/day). You would also include any search results in your bandwidth needs - which will vary greatly depending on your environment.
Bandwidth figures can be figured out by averaging this across 24 hours, but that won't take into consideration peak loads when there are spikes in forwarded data. But academically speaking you could get by with ~36Mb/s in bandwidth (assumes 750GB/day total transfer rate).

0 Karma

kaufmanm
Communicator

Thanks for the response! Makes sense, we kicked the can down the road by expanding our on-prem for now, what you wrote out there will likely have to be the approach in the months leading up to getting booted from our datacenters.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

In a cluster, all indexers need to have the exact same configuration, deployed via the cluster master, so what you are trying to do is unfortunately not currently possible in a single cluster.
You could setup the new indexers as a separate, second cluster and configure that to take full advantage of your disk space, but your current indexers may still age out data sooner than you want to due to possibly not having enough space to keep a year's worth of data (depending on how many new ones you add, i.e. whether the per-indexer storage need drops to a number sufficient to keep a year) . You would also need to find a way to proportionally send data to each cluster. That gets really messy quickly...
So, either match existing storage capacity on the new indexers, or deploy new indexers with more disk space and over time upgrade storage on your existing nodes to match.

kaufmanm
Communicator

Thanks, the next step for us once we double the storage is to move into AWS. Can we span a site in a multisite cluster between our datacenter and an AWS availability zone? e.g. We have three indexers in DC1, three indexers in DC2 in a multisite cluster where DC1 is a site and DC2 is a site. Can we add three indexers in AWS AZ1 and add them to the DC1 site within the multisite cluster, and the same for the other side? What kind of network bandwidth between the DC and AWS would that require, assuming most of 500 GB/day is generated in DC1 today? Then we could gradually add more indexers to AZ1 as we retire indexers in DC1 until we're fully in AWS?

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...