Deployment Architecture

Auto Scaling Splunk Indexer Cluster

luhadia_aditya
Path Finder

Hello Splunk Gurus,

I would like to understand if Splunk has solved this problem about auto-scaling Splunk Indexer-Cluster depending upon the incoming data-volume in AWS via tools like K8s or Terraform or any-other?

What the problem statement states is –

  1. Spin more indexing nodes as the data volume increases, automatically
    1. Provision an AWS instance with Splunk image
    2. Mount the data volume
    3. Add the indexer into existing cluster as a peer to store and replicate the buckets
  2. Remove indexing  nodes as data volume decreases, automatically
    1. Inform the Cluster Master about scaling down
    2. Remove the indexer(s) from the cluster
    3. Unmount the data volume and free-up the disk space back to AWS
    4. De-commission the AWS instances
  3. Making sure the data is fully available and searchable during this process

Purpose of this exercise is - To save the AWS cost since its pay-as-you-use model and if, on the day of less incoming data, few of the indexing nodes can be shut-down since they are mostly underutilized on such days due to less search activities and less indexing data.

My biggest concern about auto-scaling is - the fact that buckets are replicated randomly on all the indexers of the cluster, and if on a certain day when there is less data incoming, let's say over the weekends, if n indexer nodes can be shut-down to save cost, data is not completely available.
And with SF=2, RF=2, if Cluster is recovered to its full-state with n nodes being shut-down, On Monday there will be so many excessive buckets with those node again becoming part of the cluster to handle the working week-day traffic.

Answers I seek - I would like to know the insights about this problem-solving in terms of approach and strategy if someone and/or Splunk has solved it with their Splunk Cloud offering.
I would also like to understand and have assessment inputs from the community and Splunk Gurus / Architects if its really a worthy problem to solve or if it makes sense at all, it may be an absurd idea and I am fine learning it. 🙂 

Thanks!

Labels (2)
0 Karma

mattymo
Splunk Employee
Splunk Employee

Hello pleas check out the Splunk Operator for Kubernetes project available in Beta now! This functionality is something we would like to deliver as part of this project! Please join us to try it out!

https://github.com/splunk/splunk-operator/tree/develop/docs#getting-started-with-the-splunk-operator...

 

- MattyMo
0 Karma

ericjorgensenjr
Path Finder

You could accomplish this type of functionality by running your Indexers in a Docker container and managing them with Kubernetes.

This is probably a good place to start looking if you're interested: https://github.com/mhassan2/splunk-n-box

0 Karma

scelikok
SplunkTrust
SplunkTrust

Hi @luhadia_aditya,

I didn't implement this kind of automation but you can achieve this automation by using SmartStore. Since all warm and cold buckets will sit on AWS S3 (or any compatible S3 storage) bucket, replication will work only on indexers local caches. You can setup your decommission scripts using --enforce-counts parameter. This will ensure off-lined peer's replicated data is available on other peer nodes in the cluster.

But in any case your searches will be disrupted while decommissioning. 

Newly commissioned nodes will start downloading the requested buckets from S3 when need. This will not or minimal effect on searches.

Please keep in mind that indexers count on the cluster depends on the search load as well. In fact search load has more effect on indexers than data ingestion. You should decide required node count  using search load (users ad-hoc searches, scheduled reports, alerts, dashboards etc.) as parameter too.

I hope this gives you an idea.

 

If this reply helps you and upvote is appreciated.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

isoutamo
SplunkTrust
SplunkTrust

Hi

I haven’t heard for this kind of system. As @scelikok said you must also take care of searches, reports, alerts etc not only indexing part. My preferred solution will be a use of smartstore instead of autoscaling.

Have you calculate which kind of saves you will archive if you could do this kind of automation? As @scelikok already said there will be interruptions on searches when you are taking indexers away from cluster. I suppose that this is basically doable but you must code that logic instead of using directly AWS’s autoscaling. To getting a real savings, there must be a significant difference for use a resources by time.

r. Ismo

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...