[ClusterLabs] Upgrade to OLE8 + Pacemaker

Tue Oct 10 11:32:07 EDT 2023

Hi Qusay,

please, find respones in-line.

On 10/6/23 14:53, Jibrail, Qusay (GfK) via Users wrote:
> Hi,
> 
> May I get an answer please?
> 
> Kind regards,
> 
> *––*
> 
> *Qusay Jibrail*
> 
> Senior Infrastructure Engineer – Linux | GfK IT Services
> GfK – an NIQ company |The Netherlands
> Krijgsman 22-25 | Amstelveen | 1186 DM
> T: +31 88 435 1232 | M: +31 628 927 686
> 
> 	
> 
> website 
> <https://www.gfk.com/home?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
> 		
> 
> blog 
> <https://www.gfk.com/blog?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
> 		
> 
> instagram <https://instagram.com/gfk.insider>
> 
> 		
> 
> linkedin <https://linkedin.com/company/gfk>
> 
> 		
> 
> youtube <https://www.youtube.com/user/GfKTube>
> 
> 		
> 
> twitter <https://twitter.com/gfk>
> 
> 	
> 
> 							
> 
> *From:*Jibrail, Qusay (GfK)
> *Sent:* Wednesday, 4 October 2023 11:11
> *To:* Cluster Labs - All topics related to open-source clustering 
> welcomed <users at clusterlabs.org>
> *Subject:* RE: [ClusterLabs] Upgrade to OLE8 + Pacemaker
> 
> Hi Tomas,
> 
> Ok.. it is getting little bit complicated.
> 
> What about this approach:
> 
>   * pcs cluster stop “*server3*”, do upgrade to OLE8 + update
>       pacemaker, corosync and pcs, check postfix is working, wait 1 day.
>   * pcs cluster stop “*server4*”, do upgrade to OLE8 + update pacemaker,
>     corosync and pcs, check postfix is working, wait 1 day. Now we have
>     both servers running the same version of OS, pacemaker, corosync and pcs
>   * Then pcs cluster *start* “server3” and pcs cluster *start* “server4”
> 
> Will above works?

If you expect to start postfix in cluster on upgraded node "*server3*" it
will not start, because cluster will wait until it forms quorum with
the other node. The other node will not be able to communicate until it
is upgraded. One possible way how to start the possix resource would be
removing second node from the cluster configuration... This brings us
back to the one-node cluster solution which Tomas proposed.

Your soulution could work after solving some obstacles but you will not
have updated cluster configuration files. For example you will not
benefit from new corosync network transport layer knet. Therefore I
think, new cluster is a better solution.

I would create a new one-node cluster with resources and than i would
add second node by using command `pcs cluster node add` after second node
upgrade. This will make sure that cluster configuration files will be at
current state.

Example procedure could look like:
* backup configuration files (`pcs config backup`, /etc files...)
* put one node to standby mode by using command `pcs cluster standby` and
   check that all resources are moved to the other node (`pcs status`)
* stop the cluster on the node (`pcs cluster stop`)
* remove cluster configuration from the node (`pcs cluster destroy`)
* upgrade to OLE8 or do a fresh install (which will be easier)
* after reboot setup a new on-node cluster (`pcs host auth`, `pcs cluster
   setup`)
* start cluster (`pcs cluster start`)
* setup resources (`pcs stonith create`, `pcs resource create`)
* check that resources are working
* backup cofiguration files on the second node
* stop the cluster on the second node (`pcs cluster stop`)
* remove cluster configuration from the second node (`pcs cluster
     destroy`)
* upgrade or fresh install + reboot
* on the first node add the second node (`pcs host auth`, `pcs cluster
     node add`)
* check if added node is running resources (`pcs status`)

> We will have 2 days without load balancing.
> 
> As the server will be rebooted during OLE8 upgrade, is it better to 
> disable pacemaker, corosync and pcs services from starting after reboot?
> 
> Or will it not start till I run the command pcs cluster *start* 
> “server3” and pcs cluster *start* “server4”?

In order to prevent cluster start after a reboot you need to disable
pacemaker and corosync services by using command `pcs cluster disable`

Your status output shows that they are enabled:
```
Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled
```

> > Kind regards,
> 
> *––*
> 
> *Qusay Jibrail*
> 
> Senior Infrastructure Engineer – Linux | GfK IT Services
> GfK – an NIQ company |The Netherlands
> Krijgsman 22-25 | Amstelveen | 1186 DM
> T: +31 88 435 1232 | M: +31 628 927 686
> 
> 	
> 
> website 
> <https://www.gfk.com/home?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
> 		
> 
> blog 
> <https://www.gfk.com/blog?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
> 		
> 
> instagram <https://instagram.com/gfk.insider>
> 
> 		
> 
> linkedin <https://linkedin.com/company/gfk>
> 
> 		
> 
> youtube <https://www.youtube.com/user/GfKTube>
> 
> 		
> 
> twitter <https://twitter.com/gfk>
> 
> 	
> 
> 							
> 
> *From:*Users <users-bounces at clusterlabs.org 
> <mailto:users-bounces at clusterlabs.org>> *On Behalf Of *Tomas Jelinek
> *Sent:* Tuesday, 3 October 2023 16:50
> *To:* users at clusterlabs.org <mailto:users at clusterlabs.org>
> *Subject:* Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker
> 
> Dne 03. 10. 23 v 16:24 Jibrail, Qusay (GfK) via Users napsal(a):
> 
>     Hi Reid,
> 
>     Thank you for the answer.
> 
>     So my plan will be:
> 
>      1. pcs config backup  /root/"Server Name"
>      2. create a backup of /etc/corosync/
>      3. create a backup of /etc/postfix
>      4. pcs cluster stop “server3” àjust to do the failover to server4.
>         The command pcs cluster stop “server3” will stop corosync and
>         pacemaker right?
> 
> Hi,
> 
> Yes, 'pcs cluster stop' command stops both pacemaker and corosync.
> 
>      5.
>      6. run pcs status on server3 which should give an error message.
>         And on server4 should show 1 offline node and one online node
>      7. Upgrade server3 to OLE8 which will upgrade these 3 package to:
>         corosync                                          
>         x86_64                     3.1.7-1.el8 
> 
>     pacemaker                                      
>     x86_64                     2.1.5-9.3.0.1.el8_8
> 
>     pcs                                                   
>       x86_64                     0.10.15-4.0.1.el8_8.1
> 
>      8. Then run crm_verify to check the configuration. If the
>         verification is OK then,
>      9. pcs cluster start “server3”
>     10. run pcs status on both nodes.
> 
>     Please see the version of the current installed software.
> 
>     [root at server3 ~]# corosync -v
> 
>     Corosync Cluster Engine, version '2.4.5'
> 
>     Copyright (c) 2006-2009 Red Hat, Inc.
> 
>     [root at server3 ~]# pacemakerd --version
> 
>     Pacemaker 1.1.23-1.0.1.el7_9.1
> 
>     Written by Andrew Beekhof
> 
>     [root at server3 ~]# pcs --version
> 
>     0.9.169
> 
>     Did I missed anything?
> 
> Corosync 3 is not compatible with Corosync 2. So once you update server3 
> to OLE8, it won't be able to join server4 in the cluster and take over 
> cluster resources.
> 
> If you are restricted to two nodes, you may remove server3 from the 
> cluster, update server3 to OLE8 and create a one node cluster on 
> server3. Once you have two one-node clusters, move resources from 
> server4 cluster to server3 cluster manually. Then you destroy cluster on 
> server4, update server4 to OLE8, and add server4 to the new cluster.
> 
> Regards,
> Tomas
> 
>     Kind regards,
> 
>     *––*
> 
>     *Qusay Jibrail*
> 
>     Senior Infrastructure Engineer – Linux | GfK IT Services
>     GfK – an NIQ company |The Netherlands
>     Krijgsman 22-25 | Amstelveen | 1186 DM
>     T: +31 88 435 1232 | M: +31 628 927 686
> 
>     	
> 
>     website
>     <https://www.gfk.com/home?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
>     		
> 
>     blog
>     <https://www.gfk.com/blog?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
>     		
> 
>     instagram <https://instagram.com/gfk.insider>
> 
>     		
> 
>     linkedin <https://linkedin.com/company/gfk>
> 
>     		
> 
>     youtube <https://www.youtube.com/user/GfKTube>
> 
>     		
> 
>     twitter <https://twitter.com/gfk>
> 
>     	
> 
>     							
> 
>     *From:*Reid Wahl <nwahl at redhat.com> <mailto:nwahl at redhat.com>
>     *Sent:* Tuesday, 3 October 2023 09:03
>     *To:* Cluster Labs - All topics related to open-source clustering
>     welcomed <users at clusterlabs.org> <mailto:users at clusterlabs.org>
>     *Cc:* Jibrail, Qusay (GfK) <Qusay.Jibrail at gfk.com>
>     <mailto:Qusay.Jibrail at gfk.com>
>     *Subject:* Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker
> 
>     *WARNING:* This email originated outside of GfK.
>        DO NOT CLICK links or attachments unless you recognize the sender
>     and know the content is safe.
> 
>     On Mon, Oct 2, 2023 at 10:51 PM Jibrail, Qusay (GfK) via Users
>     <users at clusterlabs.org <mailto:users at clusterlabs.org>> wrote:
> 
>         Hello,
> 
>         I am aiming for upgrading one of the cluster node to OLE8
>         (current version OLE7) and test if postfix is working fine.
> 
>         If yes then upgrade the second node to OLE8.
> 
>         My questions:
> 
>         Will Pacemaker configuration works after the upgrade?
> 
>     Hi,
> 
>     It should. Pacemaker supports rolling upgrades from 1.1.11 (and
>     above) to 2.x.x. Other components besides Pacemaker may break, so
>     I'd suggest having a backout plan before any upgrade activity.
> 
>         Do I need to make any changes before or after the upgrade to OLE8?
> 
>         So server3 will be done first and then server4. Is that the
>         right order?
> 
>         Do I need to stop any services before the upgrade?
> 
>     For Pacemaker, follow the procedure at
>     https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Administration/singlehtml/#upgrading-a-pacemaker-cluster <https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Administration/singlehtml/#upgrading-a-pacemaker-cluster>.
> 
>     Based on your plan, you should focus particularly on the rolling
>     upgrade section.
> 
>     Refer to your OS vendor's documentation for any other steps you
>     should take during an OS upgrade.
> 
>         *[root at server3 ~]# pacemakerd --version*
> 
>         Pacemaker 1.1.23-1.0.1.el7_9.1
> 
>         Written by Andrew Beekhof
> 
>         *[root at server3 ~]# pcs status*
> 
>         Cluster name: xxxxxxxxxx
> 
>         Stack: corosync
> 
>         Current DC: *server3* (version 1.1.23-1.0.1.el7_9.1-9acf116022)
>         - partition with quorum
> 
>         Last updated: Tue Oct  3 07:29:46 2023
> 
>         Last change: Sun May  1 17:02:03 2022 by hacluster via crmd on
>         server3
> 
>         2 nodes configured
> 
>         2 resource instances configured
> 
>         Online: [ server3 server4 ]
> 
>         Full list of resources:
> 
>         Clone Set: smtpout-postfix-res-clone [smtpout-postfix-res]
> 
>               Started: [ server3 server4 ]
> 
>         Daemon Status:
> 
>            corosync: active/enabled
> 
>            pacemaker: active/enabled
> 
>            pcsd: active/enabled
> 
>         *[root at server3 ~]# postconf -d | grep mail_version*
> 
>         mail_version = 2.10.1
> 
>         milter_macro_v = $mail_name $mail_version
> 
>         *[root at server3 ~]# lsb_release -a*
> 
>         LSB Version:    :core-4.1-amd64:core-4.1-noarch
> 
>         Distributor ID: OracleServer
> 
>         Description:    Oracle Linux Server release 7.9
> 
>         Release:        7.9
> 
>         Codename:       n/a
> 
>         Kind regards,
> 
>         *––*
> 
>         *Qusay Jibrail*
> 
>         Senior Infrastructure Engineer – Linux | GfK IT Services
>         GfK – an NIQ company |The Netherlands
>         Krijgsman 22-25 | Amstelveen | 1186 DM
>         T: +31 88 435 1232| M: +31 628 927 686
> 
>         	
> 
>         website
>         <https://www.gfk.com/home?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
>         		
> 
>         blog
>         <https://www.gfk.com/blog?utm_campaign=global_all_2022_email%20signature&utm_source=email&utm_medium=signature&utm_content=>
> 
>         		
> 
>         instagram <https://instagram.com/gfk.insider>
> 
>         		
> 
>         linkedin <https://linkedin.com/company/gfk>
> 
>         		
> 
>         youtube <https://www.youtube.com/user/GfKTube>
> 
>         		
> 
>         twitter <https://twitter.com/gfk>
> 
>         	
> 
>         							
> 
>         _______________________________________________
>         Manage your subscription:
>         https://lists.clusterlabs.org/mailman/listinfo/users
>         <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
>         ClusterLabs home: https://www.clusterlabs.org/
>         <https://www.clusterlabs.org/>
> 
> 
> 
>     -- 
> 
>     Regards,
> 
>     Reid Wahl (He/Him)
> 
>     Senior Software Engineer, Red Hat
> 
>     RHEL High Availability - Pacemaker
> 
>     _______________________________________________
> 
>     Manage your subscription:
> 
>     https://lists.clusterlabs.org/mailman/listinfo/users
>     <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
>     ClusterLabs home: https://www.clusterlabs.org/
>     <https://www.clusterlabs.org/>
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

Regards,
Miroslav