[ClusterLabs] Migrating off CentOS

Mon Jan 15 10:55:11 EST 2024

On Sat, 2024-01-13 at 09:07 -0600, Billy Croan wrote:
> I'm planning to migrate a two-node cluster off CentOS 7 this year.  I
> think I'm taking it to Debian Stable, but open for suggestions if any
> distribution is better supported by pacemaker.

Debian, RHEL, SUSE, Ubuntu, and compatible distros should all have good
support.

Fedora and FreeBSD get regular builds and basic testing but have fewer
users exercising them in production.

FYI, if you want to keep the interfaces you're familiar with, the free
RHEL developer license now allows most personal and small-business
production use: https://access.redhat.com/discussions/5719451

> 
> Have any of you had success doing major upgrades (bullseye to
> bookworm on Debian) of your physical nodes one at a time while each
> node is in standby+maintenance, and rolling the vm from one to the
> other so it doesn't reboot while the hosts are upgraded?  That has
> worked well for me for minor OS updates, but I'm curious about the
> majors.  
> 
> My project this year is even more major, not just upgrading the OS
> but changing distributions.
> 
> I think I have three possible ways I can try this:
> 1) wipe all server disks and start fresh.

A variation, if you can get new hosts, is to set up a test cluster on
new hosts, and once you're comfortable that it will work, stop the old
cluster and turn the new one into production.

> 
> 2) standby and maintenance one node, then reinstall it with a new OS
> and make a New Cluster.  shutdown the vm and copy it, offline, to the
> new one-node cluster. and start it up there. Then once that's
> working, wipe and reinstall the other node, and add it to the new
> cluster.

This should be fine.

> 
> 3) standby and maintenance one node, then Remove it from the
> cluster.  Then reinstall it with the new distribution's OS.  Then re-
> add it to the Existing Cluster.  Move the vm resource to it and
> verify it's working, then do the same with the other physical node,
> and take it out of standby&maint to finish.
> 

This would be fine as long as the corosync and pacemaker versions are
compatible. However as Michele mentioned, RHEL 7 uses Corosync 2, and
the latest of any distro will use Corosync 3, so that will sink this
option.

> (Obviously any of those methods begin with a full backup to offsite
> and local media. and end with a verification against that backup.)
> 
> #1 would be the longest outage but the "cleanest result"
> #3 would be possibly no outage, but I think the least likely to
> work.  I understand EL uses pcs and debian uses crm for example...

Debian offers both IIRC. But that won't affect the upgrade, they both
use Pacemaker command-line tools under the hood. The only difference is
what commands you run to get the same effect.

> #2 is a compromise that should(tm) have only a few seconds of
> outage.  But could blow up i suppose.  They all could blow up though
> so I'm not sure that should play a factor in the decision.
> 
> I can't be the first person to go down this path.  So what do you all
> think?  how have you done it in the past?

-- 
Ken Gaillot <kgaillot at redhat.com>