[ClusterLabs] Proper procedure for pacemaker RPM upgrades in active cluster

Mon Jan 15 18:10:31 EST 2018

On Mon, 2018-01-15 at 15:42 -0500, Doug Cahill wrote:
> Hello,
> 
> I'm looking for some guidance on pacemaker RPM upgrades in a running
> cluster environment.  I'm looking to automate the process of
> upgrading
> the RPMs when we decide to plan an upgrade cycle for our clusters.
> 
> What I found is that during the RPM upgrade process the
> pacemaker.x86_64 RPM will shutdown the pacemaker service.  My
> question
> regarding this is...is it possible to upgrade the RPM component but
> delay the restart part of the pacemaker service to a later time?  If
> delaying the restart isn't possible, what is the preferred process
> for
> people with existing clusters that require package upgrades?  Should
> I
> upgrade the passive side first and then fail over to it and then
> upgrade the other node which is now passive?  Does pacemaker support
> running two nodes at different version levels during the upgrade
> process?  Would enabling maintenance mode be appropriate/ideal for
> this?

Yes to most of those :)

Detailed information about upgrade techniques:

http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pa
cemaker_Explained/index.html#_upgrading

Basically, the failover scenario you mentioned is the "rolling upgrade"
technique, and the maintenance mode scenario you mentioned is the
"detach and reattach" technique.

Each has advantages and disadvantages. A rolling upgrade lets you keep
on node on a known working setup as long as possible, while a detach-
and-reattach gives you zero downtime (as long the upgrade has no
problems ...).

> 
> I last experienced this situation when I upgraded from 1.1.15 to
> 1.1.17.  Now that pacemaker 1.1.18 is available I'm looking to plan
> this process a little better and would like to know what others use
> as
> a procedure.
> 
> Basic software config:
> CentOS 6.x (2.6.32-696.13.2.el6.x86_64)
> pacemaker.x86_64       1.1.17-1.el6
> corosync.x86_64        2.4.2-1.el6
> crmsh.noarch           3.0.1_283-0
> Two-node Cluster resources are configured for active/passive
> operation.
> 
> Thanks,
> -Doug

On a side note, if you're building 1.1.18 packages yourself, it's a
good idea to use the latest upstream 1.1 branch, because it fixes an
important regression in 1.1.18.
-- 
Ken Gaillot <kgaillot at redhat.com>