[ClusterLabs] Rolling upgrade from Corosync 2.3+ to Corosync 2.99+ or Corosync 3.0+?

Thu Jun 11 12:26:18 EDT 2020

Hello, Strahil.
Thanks for your suggestion. 
We are doing something similar to what you suggest, but:
1. We do not have external storage. Our product is a single box with 2 internal heads and 10-14 PB (peta) of data in a single box. (or it could have 9 boxes hooked up together , but still with just 2 heads and 9 times more storage). 
2. Setup a new cluster is kind of hard. We do that on an extra partitions in chroot while the old cluster is running, so shutdown should be pretty short if we can figure out a way for the cluster to work while we configure the new partitions.
3. At this time we have to stop a node to move configuration from old to new partition, initialize new databases, etc. While we are doing that the other node is taking over all processing.
We will see if we can incorporate your suggestion into our upgrade path.
Thanks a lot for your help!
_Vitaly

> On June 11, 2020 12:00 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:
> 
>  
> Hi Vitaly,
> 
> have you considered  something like  this:
> 1.  Setup a  new cluster
> 2.  Present the same  shared storage on the new  cluster
> 3. Prepare the resource configuration but do not apply yet.
> 3. Power down all  resources on old cluster
> 4. Deploy the resources on the new cluster and immediately bring the  resources up
> 5. Remove access  to the shared storage for the  old cluster
> 6. Wipe the  old  cluster.
> 
> Downtime  will be way  shorter.
> 
> Best Regards,
> Strahil  Nikolov
> 
> На 11 юни 2020 г. 17:48:47 GMT+03:00, Vitaly Zolotusky <vitaly at unitc.com> написа:
> >Thank you very much for quick reply!
> >I will try to either build new version on Fedora 22, or build the old
> >version on CentOs 8 and do a HA stack upgrade separately from my full
> >product/OS upgrade. A lot of my customers would be extremely unhappy
> >with even short downtime, so I can't really do the full upgrade
> >offline.
> >Thanks again!
> >_Vitaly
> >
> >> On June 11, 2020 10:14 AM Jan Friesse <jfriesse at redhat.com> wrote:
> >> 
> >>  
> >> > Thank you very much for your help!
> >> > We did try to go to V3.0.3-5 and then dropped to 2.99 in hope that
> >it may work with rolling upgrade (we were fooled by the same major
> >version (2)). Our fresh install works fine on V3.0.3-5.
> >> > Do you know if it is possible to build Pacemaker 3.0.3-5 and
> >Corosync 2.0.3 on Fedora 22 so that I 
> >> 
> >> Good question. Fedora 22 is quite old but close to RHEL 7 for which
> >we 
> >> build packages automatically (https://kronosnet.org/builds/) so it 
> >> should be possible. But you are really on your own, because I don't 
> >> think anybody ever tried it.
> >> 
> >> Regards,
> >>    Honza
> >> 
> >> 
> >> 
> >> upgrade the stack before starting "real" upgrade of the product?
> >> > Then I can do the following sequence:
> >> > 1. "quick" full shutdown for HA stack upgrade to 3.0 version
> >> > 2. start HA stack on the old OS and product version with Pacemaker
> >3.0.3 and bring the product online
> >> > 3. start rolling upgrade for product upgrade to the new OS and
> >product version
> >> > Thanks again for your help!
> >> > _Vitaly
> >> > 
> >> >> On June 11, 2020 3:30 AM Jan Friesse <jfriesse at redhat.com> wrote:
> >> >>
> >> >>   
> >> >> Vitaly,
> >> >>
> >> >>> Hello everybody.
> >> >>> We are trying to do a rolling upgrade from Corosync 2.3.5-1 to
> >Corosync 2.99+. It looks like they are not compatible and we are
> >getting messages like:
> >> >>
> >> >> Yes, they are not wire compatible. Also please do not use 2.99
> >versions,
> >> >> these were alfa/beta/rc before 3.0 and 3.0 is actually quite a
> >long time
> >> >> released (3.0.4 is latest and I would recommend using it - there
> >were
> >> >> quite a few important bugfixes between 3.0.0 and 3.0.4)
> >> >>
> >> >>
> >> >>> Jun 11 02:10:20 d21-22-left corosync[6349]:   [TOTEM ] Message
> >received from 172.18.52.44 has bad magic number (probably sent by
> >Corosync 2.3+).. Ignoring
> >> >>> on the upgraded node and
> >> >>> Jun 11 01:02:37 d21-22-right corosync[14912]:   [TOTEM ] Invalid
> >packet data
> >> >>> Jun 11 01:02:38 d21-22-right corosync[14912]:   [TOTEM ] Incoming
> >packet has different crypto type. Rejecting
> >> >>> Jun 11 01:02:38 d21-22-right corosync[14912]:   [TOTEM ] Received
> >message has invalid digest... ignoring.
> >> >>> on the pre-upgrade node.
> >> >>>
> >> >>> Is there a good way to do this upgrade?
> >> >>
> >> >> Usually best way is to start from scratch in testing environment
> >to make
> >> >> sure everything works as expected. Then you can shutdown current
> >> >> cluster, upgrade and start it again - config file is mostly
> >compatible,
> >> >> you may just consider changing transport to knet. I don't think
> >there is
> >> >> any definitive guide to do upgrade without shutting down whole
> >cluster,
> >> >> but somebody else may have idea.
> >> >>
> >> >> Regards,
> >> >>     Honza
> >> >>
> >> >>> I would appreciate it very much if you could point me to any
> >documentation or articles on this issue.
> >> >>> Thank you very much!
> >> >>> _Vitaly
> >> >>> _______________________________________________
> >> >>> Manage your subscription:
> >> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> >> >>>
> >> >>> ClusterLabs home: https://www.clusterlabs.org/
> >> >>>
> >> >
> >_______________________________________________
> >Manage your subscription:
> >https://lists.clusterlabs.org/mailman/listinfo/users
> >
> >ClusterLabs home: https://www.clusterlabs.org/