Solved: splunk index cluster + search head cluster upgrade...

bryanwiggins · ‎11-21-2016

Hi

environment (all linux OS based):
3x index cluster peers
1x cluster master
1x deployer/license master
3x search head cluster peers
2x heavy forwarders

Question:
I have been reading the following documentation for upgrading splunk - http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Upgradeacluster - I read from it that it is still not possible to perform rolling upgrades? A shame if that is true, as ELK is still a possible option for us and afaik you can run rolling upgrades on ELK clusters.

If it is the case that we cannot at this point do rolling upgrades on our splunk nodes, is there a preferred approach to how we queue data coming in from the heavy forwarders?

I have been reading the following documentation relating to forwarders 'wait queue' -http://docs.splunk.com/Documentation/Splunk/6.5.0/Forwarding/Protectagainstlossofin-flightdata - and my thinking is that I could increase the 'readTimeout' in the 'outputs.conf' to some arbitrary figure to cover the cluster upgrade process (will test in a lab first to get expected time). depending that we have enough disk capacity for the 'wait queue' is my thinking ok?

Also, does anybody know if splunk are planning to allow for rolling upgrades in the near-future, as i'm sure I wouldn't be the only one seeing this as more than desirable 🙂

Thx
Bry

masonmorales · ‎11-21-2016

Rolling upgrades are currently only supported for maintenance releases (e.g. v6.4.3 -> v6.4.4). I don't know what their roadmap is for extending rolling upgrade support, but I would expect they will.

Events will queue on the forwarders automatically in memory when the indexers are unreachable. If you think they will be unreachable for an extended period of time, you may want to enable persistent queues. See: https://docs.splunk.com/Documentation/Splunk/6.5.0/Data/Usepersistentqueues

View solution in original post

maciep · ‎11-21-2016

I'm pretty sure I submitted an enhancement request for this a few versions ago now. The fact that you have to completely take a cluster down to upgrade it really defeats the purpose of having a cluster.

That said, we still do rolling upgrades. We have devices that send thousands of events per second and we can't queue them long enough to upgrade our entire cluster. And even if we could, risking the loss of data isn't worth it.

I'd have to review our documentation, but I think we typically:

upgrade the master
enter maint mode
upgrade the peers one-by-one (or many-by-many)
upgrade the shc as defined in the docs
upgrade the rest of the env

bryanwiggins · ‎12-08-2016

Hi maciep

Thanks for your pointers - I ran this up in the lab and it seemed to work, so I shall do some more testing whilst running some data loading (ran out of time to do a deep test).

I had hoped to accept multiple answers but I was unable to. I have awarded 10 points - hopefully that is ok as i'm not sure what the correct protocol is on this site for awarding points (happy to be informed 🙂 )

Thx
Bry

maciep · ‎12-08-2016

no problem, Bry. I never worried about points, just hope it works for you.

Also, we did just upgrade from 6.3.4 to 6.5.1 following this approach without issues. I'm not sure what the problems could be that require you to upgrade the entire cluster at once, but I don't think I ever run into any of them.

bryanwiggins · ‎12-14-2016

maciep

thank you for your update - very interesting. maybe splunk will make that approach formal for future version upgrades, as it's not the best suggesting offlining the cluster (of course, once multi-site not too much of an issue).

again, thank you!
Bry

bryanwiggins · ‎11-22-2016

hi maciep

thanks for your reply. i completely agree, a complete single-site cluster that is instructed to be switched off to perform an application upgrade is quite unsettling!

i'll run the steps in a lab to confirm the version upgrade, as that sounds similar to a recent process when I needed to update the splunk OS kernel's for 'dirty cow' vulnerability.

I'll report back on the test results.

Again, thanks for your response!

Thx
Bry

bryanwiggins · ‎11-30-2016

slight delay in running this up in my lab - scheduled for this week though. I will update my findings.

Thx
Bry

masonmorales · ‎11-21-2016

Rolling upgrades are currently only supported for maintenance releases (e.g. v6.4.3 -> v6.4.4). I don't know what their roadmap is for extending rolling upgrade support, but I would expect they will.

Events will queue on the forwarders automatically in memory when the indexers are unreachable. If you think they will be unreachable for an extended period of time, you may want to enable persistent queues. See: https://docs.splunk.com/Documentation/Splunk/6.5.0/Data/Usepersistentqueues

bryanwiggins · ‎11-22-2016

hi masonmorales

thanks too for your response to my query. I like the 'persistent queues', thanks for that doc link. Similar to what I was reading but more targeted for our specific needs (tcp data inputs).

I shall run this up in the lab this week and report back on the results.

thanks again!
Bry

bryanwiggins · ‎12-08-2016

thanks masonmorales

I accepted this as the answer as it directly references how to deal with queued data on the heavies - I had tested a rolling upgrade (maciep) which seemed to work.

Thx
Bry

bryanwiggins · ‎11-30-2016

slight delay in running this up in my lab - scheduled for this week though. I will update my findings.

Thx
Bry

splunk index cluster + search head cluster upgrade 6.4.1 to 6.5. Holding forward data in a queue

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)