Auto Scaling Splunk Indexer Cluster

luhadia_aditya · ‎12-29-2020

Hello Splunk Gurus,

I would like to understand if Splunk has solved this problem about auto-scaling Splunk Indexer-Cluster depending upon the incoming data-volume in AWS via tools like K8s or Terraform or any-other?

What the problem statement states is –

Spin more indexing nodes as the data volume increases, automatically

Provision an AWS instance with Splunk image
Mount the data volume
Add the indexer into existing cluster as a peer to store and replicate the buckets

Remove indexing nodes as data volume decreases, automatically

Inform the Cluster Master about scaling down
Remove the indexer(s) from the cluster
Unmount the data volume and free-up the disk space back to AWS
De-commission the AWS instances

Making sure the data is fully available and searchable during this process

Purpose of this exercise is - To save the AWS cost since its pay-as-you-use model and if, on the day of less incoming data, few of the indexing nodes can be shut-down since they are mostly underutilized on such days due to less search activities and less indexing data.

My biggest concern about auto-scaling is - the fact that buckets are replicated randomly on all the indexers of the cluster, and if on a certain day when there is less data incoming, let's say over the weekends, if n indexer nodes can be shut-down to save cost, data is not completely available.
And with SF=2, RF=2, if Cluster is recovered to its full-state with n nodes being shut-down, On Monday there will be so many excessive buckets with those node again becoming part of the cluster to handle the working week-day traffic.

Answers I seek - I would like to know the insights about this problem-solving in terms of approach and strategy if someone and/or Splunk has solved it with their Splunk Cloud offering.
I would also like to understand and have assessment inputs from the community and Splunk Gurus / Architects if its really a worthy problem to solve or if it makes sense at all, it may be an absurd idea and I am fine learning it. 🙂

Thanks!

mattymo · ‎02-28-2021

Hello pleas check out the Splunk Operator for Kubernetes project available in Beta now! This functionality is something we would like to deliver as part of this project! Please join us to try it out!

https://github.com/splunk/splunk-operator/tree/develop/docs#getting-started-with-the-splunk-operator...

- MattyMo

ericjorgensenjr · ‎01-05-2021

You could accomplish this type of functionality by running your Indexers in a Docker container and managing them with Kubernetes.

This is probably a good place to start looking if you're interested: https://github.com/mhassan2/splunk-n-box

scelikok · ‎12-30-2020

Hi @luhadia_aditya,

I didn't implement this kind of automation but you can achieve this automation by using SmartStore. Since all warm and cold buckets will sit on AWS S3 (or any compatible S3 storage) bucket, replication will work only on indexers local caches. You can setup your decommission scripts using --enforce-counts parameter. This will ensure off-lined peer's replicated data is available on other peer nodes in the cluster.

But in any case your searches will be disrupted while decommissioning.

Newly commissioned nodes will start downloading the requested buckets from S3 when need. This will not or minimal effect on searches.

Please keep in mind that indexers count on the cluster depends on the search load as well. In fact search load has more effect on indexers than data ingestion. You should decide required node count using search load (users ad-hoc searches, scheduled reports, alerts, dashboards etc.) as parameter too.

I hope this gives you an idea.

If this reply helps you and upvote is appreciated.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

isoutamo · ‎01-05-2021

Hi

I haven’t heard for this kind of system. As @scelikok said you must also take care of searches, reports, alerts etc not only indexing part. My preferred solution will be a use of smartstore instead of autoscaling.

Have you calculate which kind of saves you will archive if you could do this kind of automation? As @scelikok already said there will be interruptions on searches when you are taking indexers away from cluster. I suppose that this is basically doable but you must code that logic instead of using directly AWS’s autoscaling. To getting a real savings, there must be a significant difference for use a resources by time.

r. Ismo

Auto Scaling Splunk Indexer Cluster

capacity planning

indexer clustering

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?