[ClusterLabs] Different pacemaker versions split cluster
kgaillot at redhat.com
Wed Jun 8 18:27:24 EDT 2016
On 06/07/2016 02:26 PM, DacioMF wrote:
> I clear all logs in /var/log/corosync and reboot the cluster (this is the test environment, but i want to upgrade the production).
> I attach the output of the command crm_report --from "2016-06-07 0:0:0" after the reboot.
> The corosync and pacemaker versions on Ubuntu 16.04 is 2.3.5 and 1.1.14
> The corosync and pacemaker versions on Ubuntu 14.04 is 2.3.3 and 1.1.10
> DacioMF Analista de Redes e Infraestrutura
This isn't causing your issue, but when running a mixed-version cluster,
it's essential that a node running the oldest version is elected DC. You
can ensure that by always booting and starting the cluster on it first.
In this case, we're not getting that far, because the nodes aren't
talking to each other.
The corosync.quorum output shows that everything's fine at the cluster
membership level. This can also be seen in the live CIB where
in_ccm="true" for all nodes (indicating membership), but crmd="offline"
for the different-version nodes (indicating broken pacemaker communication).
In the logs, we can see "state is now member" for all four nodes, but
pcmk_cpg_membership only sees the nodes with the same version.
I suspect the problem is in corosync's cpg handling, since
pcmk_cpg_membership logs everything it gets from corosync. I'm not
familiar with any relevant changes between 2.3.3 and 2.3.5, so I'm not
sure what's going wrong.
> Em Segunda-feira, 6 de Junho de 2016 17:30, Ken Gaillot <kgaillot at redhat.com> escreveu:
> On 05/30/2016 01:14 PM, DacioMF wrote:
>> I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked well. I need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my resources. Two nodes have been updated to 16.04 and the two others remains with 14.04. The problem is that my cluster was splited and the nodes with Ubuntu 14.04 only work with the other in the same version. The same is true for the nodes with Ubuntu 16.04. The feature set of pacemaker in Ubuntu 14.04 is v3.0.7 and in 16.04 is v3.0.10.
>> The following commands shows what's happening:
>> root at xenserver50:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:19:06 2016
>> Last change: Thu May 19 09:00:48 2016 via cibadmin on xenserver50
>> Stack: corosync
>> Current DC: xenserver51 (51) - partition with quorum
>> Version: 1.1.10-42f2063
>> 4 Nodes configured
>> 4 Resources configured
>> Online: [ xenserver50 xenserver51 ]
>> OFFLINE: [ xenserver52 xenserver54 ]
>> root at xenserver52:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:20:04 2016 Last change: Thu May 19 08:54:57 2016 by hacluster via crmd on xenserver54
>> Stack: corosync
>> Current DC: xenserver52 (version 1.1.14-70404b0) - partition with quorum
>> 4 nodes and 4 resources configured
>> Online: [ xenserver52 xenserver54 ]
>> OFFLINE: [ xenserver50 xenserver51 ]
>> xenserver52 and xenserver54 are Ubuntu 16.04 the others are Ubuntu 14.04.
>> Someone knows what's the problem?
>> Sorry by my poor english.
>> Best regards,
>> DacioMF Analista de Redes e Infraestrutura
> We aim for backward compatibility, so this likely is a bug. Can you
> attach the output of crm_report from around this time?
> crm_report --from "YYYY-M-D H:M:S" --to "YYYY-M-D H:M:S"
> FYI, you cannot do a rolling upgrade from corosync 1 to corosync 2, but
> I believe both 14.04 and 16.04 use corosync 2.
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users