I have a customer who wants to tear down the entire cluster every week.
Long story short, they do not want long lasting VMs.
Does anyone know where I can find some reference document or quotes from Splunk this is recommended?
I have done this from time to time when there were specific reasons (OS out of support, going AWS or Azure)
but not as a regular maintenance work.
Is anyone else doing this?
yes ... or other container / serverless technology ...
overall, i think what your client wants is a terrible idea, but maybe i am missing the use case or reason behind it.
The first question you have to ask is: why? What's the reason for doing that? You'll need to do quite some automation work.
I am not sure if you'd find many people who might be using Splunk in this manner.
But however at a high level for your requirement, it looks like you'll have to create templates for your VMs with specific Splunk Roles i.e indexers, search heads, Cluster Master, License Master etc.
If its only the clusters that you'd require to tear down and not the forwarders layer. You'd want to also reserve IPs for your indexers. So that whenever your clusters are available, forwarders can start sending data.
You'd require to consider connectivity tests to be run everytime after bringing up the clusters back. As well as a devops pipeline for installing the Splunk and deploying the relevant splunk confs.
An automated LDAP integration of your SHs so that your users are able to access and not have to do it all over manually.
I hope this gives you some tips.