[ClusterLabs] Failover problem with dual primary drbd
Eric Bourguinat
eric.bourguinat at steady-sun.com
Thu Jun 23 14:28:58 UTC 2016
Find it! I looked at /usr/lib/drbd/crm-fence-peer.sh and find my mistake
when I saw that constraint is placed on hostname.
hostname = ftpprod04 and ftpprod05
But I used pcmk-1 and pcmk-2 in pacemaker config...
Eric
Le 23/06/2016 09:14, Eric Bourguinat a écrit :
> My drbd config
>
> drbdadm dump all
>
> # /etc/drbd.conf
> global {
> usage-count yes;
> cmd-timeout-medium 600;
> cmd-timeout-long 0;
> }
>
> common {
> }
>
> # resource home on ftpprod04: not ignored, not stacked
> # defined at /etc/drbd.d/home.res:1
> resource home {
> on ftpprod04 {
> device /dev/drbd1 minor 1;
> disk /dev/vghome/lvhome;
> meta-disk internal;
> address ipv4 192.168.122.101:7789;
> }
> on ftpprod05 {
> device /dev/drbd1 minor 1;
> disk /dev/vghome/lvhome;
> meta-disk internal;
> address ipv4 192.168.122.102:7789;
> }
> net {
> protocol C;
> verify-alg sha1;
> allow-two-primaries yes;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> sndbuf-size 512k;
> }
> disk {
> resync-rate 110M;
> on-io-error detach;
> fencing resource-and-stonith;
> al-extents 3389;
> }
> handlers {
> split-brain "/usr/lib/drbd/notify-split-brain.sh
> ******************";
> fence-peer /usr/lib/drbd/crm-fence-peer.sh;
> after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
> }
> }
>
> Eric
>
> Le 23/06/2016 08:47, Eric Bourguinat a écrit :
>> Hello,
>>
>> centos 7.2.1511 - pacemaker 1.1.13 - corosync 2.3.4 - drbd 8.4.7-1 -
>> drbd84-utils 8.9.5
>> Linux ftpprod04 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55
>> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-1
>> Linux ftpprod05 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55
>> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-2
>>
>> source :
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/
>>
>> My resources are:
>>
>> pcs resource
>> Master/Slave Set: HomeDataClone [HomeData]
>> Masters: [ pcmk-1 pcmk-2 ]
>> Clone Set: dlm-clone [dlm]
>> Started: [ pcmk-1 pcmk-2 ]
>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started pcmk-1
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started pcmk-2
>> Clone Set: HomeFS-clone [HomeFS]
>> Started: [ pcmk-1 pcmk-2 ]
>> Clone Set: Ftp-clone [Ftp]
>> Started: [ pcmk-1 pcmk-2 ]
>> Clone Set: Sftp-clone [Sftp]
>> Started: [ pcmk-1 pcmk-2 ]
>>
>> I've a problem when testing failover.
>>
>> "pkill -9 corosync" on pcmk-2
>> - stonith reboot pcmk-2 from pcmk-1
>> - a constraint is set
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_perform_op: ++ /cib/configuration/constraints: <rsc_location
>> rsc="HomeDataClone" id="drbd-fence-by-handler-home-HomeDataClone"/>
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_perform_op: ++ <rule
>> role="Master" score="-INFINITY"
>> id="drbd-fence-by-handler-home-rule-HomeDataClone">
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_perform_op: ++ <expression attribute="#uname" operation="ne"
>> value="ftpprod04" id="drbd-fence-by-handler-home-expr-HomeDataClone"/>
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_perform_op: ++ </rule>
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_perform_op: ++ </rsc_location>
>> Jun 22 10:34:36 [1802] ftpprod04 cib: info:
>> cib_process_request: Completed cib_create operation for section
>> constraints: OK (rc=0, origin=pcmk-1/cibadmin/2, version=0.584.0)
>> - but
>> Jun 22 10:34:36 [1806] ftpprod04 pengine: notice:
>> LogActions: Demote HomeData:0 (Master -> Slave pcmk-1)
>> Why pengine demotes my survivor node?
>> - the result is that all services of the cluster are stopped
>> Stack: corosync
>> Current DC: pcmk-1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition
>> with quorum
>> 2 nodes and 14 resources configured
>>
>> Online: [ pcmk-1 ]
>> OFFLINE: [ pcmk-2 ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: HomeDataClone [HomeData]
>> Stopped: [ pcmk-1 pcmk-2 ]
>> Clone Set: dlm-clone [dlm]
>> Stopped: [ pcmk-1 pcmk-2 ]
>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Stopped
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Stopped
>> Clone Set: HomeFS-clone [HomeFS]
>> Stopped: [ pcmk-1 pcmk-2 ]
>> Clone Set: Ftp-clone [Ftp]
>> Stopped: [ pcmk-1 pcmk-2 ]
>> Clone Set: Sftp-clone [Sftp]
>> Stopped: [ pcmk-1 pcmk-2 ]
>> fence-pcmk-1 (stonith:fence_ovh): Stopped
>> fence-pcmk-2 (stonith:fence_ovh): Stopped
>>
>> PCSD Status:
>> pcmk-1: Online
>> pcmk-2: Online
>>
>> Daemon Status:
>> corosync: active/disabled
>> pacemaker: active/disabled
>> pcsd: active/enabled
>> - if I launch cluster on pcmk-2 drbd resync, the 2 nodes becomes
>> primary, the constraint is removed, all the services are started
>>
>> My constraints:
>> pcs constraint
>> Location Constraints:
>> Resource: fence-pcmk-1
>> Enabled on: pcmk-2 (score:INFINITY)
>> Resource: fence-pcmk-2
>> Enabled on: pcmk-1 (score:INFINITY)
>> Ordering Constraints:
>> start ClusterIP-clone then start Ftp-clone (kind:Mandatory)
>> start ClusterIP-clone then start Sftp-clone (kind:Mandatory)
>> promote HomeDataClone then start HomeFS-clone (kind:Mandatory)
>> start HomeFS-clone then start Ftp-clone (kind:Mandatory)
>> start HomeFS-clone then start Sftp-clone (kind:Mandatory)
>> start dlm-clone then start HomeFS-clone (kind:Mandatory)
>> Colocation Constraints:
>> Ftp-clone with ClusterIP-clone (score:INFINITY)
>> Sftp-clone with ClusterIP-clone (score:INFINITY)
>> HomeFS-clone with HomeDataClone (score:INFINITY)
>> (with-rsc-role:Master)
>> Ftp-clone with HomeFS-clone (score:INFINITY)
>> Sftp-clone with HomeFS-clone (score:INFINITY)
>> HomeFS-clone with dlm-clone (score:INFINITY)
>>
>> I think that my problem is coming from the demote of my primary
>> survivor (which has drbd datas uptodate).
>> Any idea? Thanks.
>>
>> Eric
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
--
Eric
More information about the Users
mailing list