[Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

Andrew Beekhof andrew at beekhof.net
Wed Jun 11 19:55:14 EDT 2014


On 12 Jun 2014, at 12:13 am, Alexis de BRUYN <alexis.mailinglist at de-bruyn.fr> wrote:

> On 10.06.2014 01:44, Andrew Beekhof wrote:
>> 
>> On 10 Jun 2014, at 4:07 am, Alexis de BRUYN <alexis.mailinglist at de-bruyn.fr> wrote:
>> 
>>> Hi Everybody,
>>> 
>>> I have an issue with a 2-node Debian Wheezy primary/primary DRBD
>>> Pacemaker/Corosync configuration.
>>> 
>>> After a 'crm node standby' then a 'crm node online', the DRBD volume
>>> stays in a 'split brain state' (cs:StandAlone ro:Primary/Unknown).
>>> 
>>> A soft or hard reboot of one node gets rid of the split brain and/or
>>> doesn't create one.
>>> 
>>> I have followed http://www.drbd.org/users-guide-8.3/ and keep my tests
>>> as simple as possible (no activity and no filesystem on the DRBD volume).
>>> 
>>> I don't see what I am doing wrong. Could anybody help me with this please.
>> 
>> There could be a pacemaker bug.  
>> Master/slave resources are quite complex internally and have received many improvements in the years since 1.1.7.
>> So simply upgrading pacemaker could be the answer.
> 
> Hi Andrew,
> 
> I have followed your advice and updated Pacemaker/Corosync by installing
> a fresh Debian Sid but I still have the issue with the following packages:

I don't know exactly what went into those packages and there have been more fixes (aren't there always :-/) since 1.1.10, but it is certainly recent enough to deserve a closer look.

Could you run crm_report for the period covered by your test? (No need to reproduce, just tell crm_report when you did the test and it will create a tarball for you to attach here).

> 
> # uname -a
> Linux testvm1 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64
> GNU/Linux
> 
> # cat /etc/issue && dpkg -l | egrep "corosync|pacemaker|drbd"
> Debian GNU/Linux jessie/sid \n \l
> 
> ii  corosync                       1.4.6-1                     amd64
>    Standards-based cluster framework (daemon and modules)
> ii  crmsh                          1.2.6+git+e77add-1.2        amd64
>    CRM shell for the pacemaker cluster manager
> ii  drbd8-utils                    2:8.4.4-1                   amd64
>    RAID 1 over TCP/IP for Linux (user utilities)
> ii  pacemaker                      1.1.10+git20130802-4        amd64
>    HA cluster resource manager
> ii  pacemaker-cli-utils            1.1.10+git20130802-4        amd64
>    Command line interface utilities for Pacemaker
> 
> And with the "experimental" packages, I cannot connect to the cluster
> via crmsh too:
> 
> # cat /etc/issue && dpkg -l | egrep "corosync|pacemaker|drbd"
> Debian GNU/Linux jessie/sid \n \l
> 
> ii  corosync                       2.3.3-1                     amd64
>    Standards-based cluster framework (daemon and modules)
> ii  crmsh                          1.2.6+git+e77add-1.2        amd64
>    CRM shell for the pacemaker cluster manager
> ii  drbd8-utils                    2:8.4.4-1                   amd64
>    RAID 1 over TCP/IP for Linux (user utilities)
> ii  libcorosync-common4            2.3.3-1                     amd64
>    Standards-based cluster framework, common library
> ii  pacemaker                      1.1.11-1                    amd64
>    HA cluster resource manager
> ii  pacemaker-cli-utils            1.1.11-1                    amd64
>    Command line interface utilities for Pacemaker
> 
> I will try to build last versions of Pacemaker/Corosync on a Debian
> Wheezy before reporting my issue via Bugzilla.
> 
> Thanks for your help.
> 
> 
> -- 
> Alexis de BRUYN
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140612/24566b7c/attachment-0003.sig>


More information about the Pacemaker mailing list