Migration from Standalone to indexer cluster (2 in...

norbertt911 · ‎09-25-2024

Hi fellows,

It's time to migrate our good old Standalone Splunk Enterprise to 'Small Enterprise deployment'.

I read through tons of docs, unfortunately, I didn't find any step-by-step guide, so I have many questions. May Some of you can help.

- The existing server is CentOs7, the new servers will be Ubuntu 22.04. Just before the migration, I plan to upgrade Splunk on it from 9.1.5 to the latest 9.3.1. (it wasn't updated because Centos 7 is not supported by 9.3.1) OR do I set up the new servers with 9.1.5 and upgrade them after the migration?

- Our daily volume is 3-400 GB/ day. It will not grow drastically in the medium term. What are your recommended hardware specs for the indexers? Can we use the "Mid-range indexer specification" or go for the "High-performance indexer specification" (as described at https://docs.splunk.com/Documentation/Splunk/9.3.1/Capacity/Referencehardware )

- If I understand correctly I can copy the /etc/apps/ from the old server to the new Search Head, so I will have all the apps, saved searches, etc. But what config files must be modified to get the data for the new indexers? (For forwarders this is clear, but we are using a lot of other inputs (Rest-API, HEC, scripted)

- Do I configure our existing server as part of the indexer cluster (3rd member), then when all the replications are done on the new servers remove it from the cluster, or copy the index data to one of the new indexers - rename the buckets ( adding the indexer's unique id) and let the cluster manager do the job? (Do I need a separate Cluster manager or the SH could do the job?)

And here comes the big twist...

- Currently, we using S3 storage via NAS Bridge for the cold buckets. This solution is not recommended and we are already experiencing fallbacks. So, we planned to change the configuration to use SmartStore. How I can move the current cold buckets there? (a lot of data and because of the NAS Bridge this is very very slow to copy...)

Thanks in advance

Norbert

PickleRick · ‎09-25-2024

Well... that's why normally you hire a skilled architect for such job 😉

But seriously.

There are several different issues to address here and they might be resolved in different ways depending on your particular circumstances. Like with your "migrate then upgrade" vs. "upgrade then migrate" question. They are both valid scenarios and both have their pros and cons. I'll probably go for upgrade then migrate because you limit the number of upgrades you have to do but you might want to do the migrate first to have a target architecture before attempting upgrades (and being able to roll back to an old version but with updated architecture).

The typical approach would be probably to first separate the SH tier, make sure that everything is working OK and then expand your indexer tier to clustered setup.

Yes, you need a separate CM. Maybe theoretically you could host the CM on the same machine as SH but you should definitely _not_ do that. SH is for searching CM is for managing the cluster. And you don't want to mix those functionalities.

The specs... depend on your projected usage and load. As simple as that. We can't tell you beforehand what your load will be.

Can't help with S3 issue - not enough experience to reliably advise something.

norbertt911 · ‎09-25-2024

Thank you for all of this. Every bit of information will be helpful. Believe me, if I could, I would hire a whole team for this. 🙂 But I'm just an average security guy here who "has some clue about Splunk". The wallet is owned by someone else...

BR.

isoutamo · ‎09-28-2024

Hi

I must agree with @PickleRick that this is something where you should hire experienced splunk consultant with good knowledge of infra part too. You definitely need someone to help you!

There are lot of missing information which are needed to help you to chose the correct path to do this. At least we are needing the next

are you now on onprem with hardware or some virtual environment
are you on cloud AWS, Azure, GCP
what is your target platform (still onprem with HW, virtual some cloud)
are those S3 bucket in onprem, AWS or somewhere else
what kind of connectivity you have between splunk server and S3

If you must do this by yourself w/o help by experience splunk consultant, I probably try the next approach, but this definitely depends on answers to above questions.

set up additional server with new OS but with current splunk version
migrate current splunk installation into it (e.g. https://community.splunk.com/t5/Deployment-Architecture/Splunk-Migration-from-existing-server-to-a-n...)
update it to the target splunk version
add a new SH to use it and migrate (move) SH side apps into it
add a new Cluster master and copy indexer side apps & TAs into it's manager_apps
add migrated node as 1st indexers into it
add 2nd (and maybe 3rd) nodes as additional indexers into it
If and only if you have enough fast storage network for S3 buckets, then you could enable smart store into this cluster

If above is working without issues. Then stop original standalone instance and start production migration from scratch as you have proven that your test is working and you have step by step instructions how to do it.

After you have done your real production migration change UFs and other sources to send events to this new environment.

r. Ismo

PickleRick · ‎09-30-2024

There are two possible approaches:

1. Migrate existing installation to a new OS and then "upgrade" it to cluster.

2. Do the cluster first adding new components with a new OS.

Both have their pros and cons.

Theoretically, Splunk advises that all components run "the same OS and version" whatever that means. Practically, of course that's impossible to keep this requirement throughout the whole system lifecycle if not for any other reason, just because mid-upgrade you have some part of your cluster with an earlier version, some with later. Also since Splunk doesn't really rely on too many components of the OS itself, it shouldn't matter that much as long as you're in the supported OS range. (but yes, it can cause issues with support involvement should you need their help).

OTOH, if you try to fiddle with your only existing installation (as in "migrate first, clusterize later") you have of course additional risks from manipulating your only instance. If you feel confident in your backups 😉 that might be better from the supportability point of view. But of course it involves possibly longer downtimes.

norbertt911 · ‎09-30-2024

Yes, I also agree with @PickleRick, but sadly I need to cook with what I currently have...

We have an on-Prem Standalone. OS must be replaced (Centos7) and even the hardware warranty will expire at the end of this year. We have our virtual environment and S3 as well. (I have system architect colleagues, but they are not "Spunk-related" ones.)

I have similar plans as you describe. There is only one major difference. I plan to set up 2-3 heavy forwarders and migrate the current inputs. I can do this one by one and fast, without a huge outage. I will set up the new deployment parallelly, and when everything looks okay, I will redirect the HFs to the new deployment.

Only the cold buckets are "problematic" now. But we still can keep the old environment without new input, and we can search historical data if needed, once it expires we stop the Standalone...

Thank you for the insights!

PickleRick · ‎09-30-2024

Wait a second. "We have our virtual environment and S3 as well" - does that mean that you're using smartstore or this S3 is unrelated to Splunk?

norbertt911 · ‎09-30-2024

We have S3. Currently, we are using NFS bridge to mount to the server and send the cold buckets there. It planned to change to SmarStore.

isoutamo · ‎09-30-2024

You said S3, but is this AWS S3 or some other S3ish from other vendor?

norbertt911 · ‎09-30-2024

Our private cloud...

isoutamo · ‎09-30-2024

Then performance can be an issue with it? Basically this must be capable of serving full speed of sum of your indexer nodes' (storage)interfaces speed/capacity + all other sources which are using it.

You definitely need to do a performance tests with it before you take it into use!

With smartstore there are all other buckets in use than hot buckets. This means that time by time splunk want to get all of those into your node's caches within short period of time. And this needs lots of network capacity at same time.

And remember that you cannot go back from smartstore to traditional server storage without reindexing that data. There is no supported way to convert back from S2 to local storage!

isoutamo · ‎09-30-2024

Why you want to put those HFs between sources and indexers? Usually it's better without those? Almost only reason why you need those is that there is security policy which needs isolated security zones and you must use IHFs as gateways/proxies between those zones.

Or was it so, that currently you have some modular inputs or other is this standalone instance? In that case your plan is correct. You should set up needed amount of HFs to handle those, but just for manage those inputs. Please remember that almost all inputs are not HA aware and you cannot run those parallel in several HFs at same time.

Are those buckets just frozen storage where you get those if needed and just thawed those into use or are those already used as smart store storage? If I understand right 1st option is currently in use?

If so, then you just keep those as currently or put those into some other storage. If I recall right you cannot restore (thawed) those into smartstore enabled cluster index?

Anyhow as those are standalone bucket I just propose to use individual all in one box or indexer to restore those if/when needed. Other time that box can be down.

And with smartstore, especially in onprem, you must ensure and test that you have enough throughput between nodes and S3 storage!

norbertt911 · ‎09-30-2024

I have like 70% modular input, 25% forwarded, and 5% other (scripted, HEC)

Cold and Frozen are on S3. In the last 5 years, we do not need to recall any Frozen data, so this is not really important. ( I will cross this river whenever needed :))

What is important is around 90-120 days of historical, searchable data. So I should move it from the cold back after everything is set up or I just wait until it's outdated, but keep the old server to search them...

"And withSmartStore, especially in on-prem, you must ensure and test that you have enough throughput between nodes and S3 storage! " Exactly that is what we are checking now. We can have 10G, but this is just theoretical because dedicated 10G is not possible...

Migration from Standalone to indexer cluster (2 indexers + 1 SH)

configuration

installation

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?