[ClusterLabs] Failover problem with dual primary drbd

Eric Bourguinat eric.bourguinat at steady-sun.com
Thu Jun 23 07:14:09 UTC 2016


My drbd config

drbdadm dump all

# /etc/drbd.conf
global {
     usage-count yes;
     cmd-timeout-medium 600;
     cmd-timeout-long 0;
}

common {
}

# resource home on ftpprod04: not ignored, not stacked
# defined at /etc/drbd.d/home.res:1
resource home {
     on ftpprod04 {
         device           /dev/drbd1 minor 1;
         disk             /dev/vghome/lvhome;
         meta-disk        internal;
         address          ipv4 192.168.122.101:7789;
     }
     on ftpprod05 {
         device           /dev/drbd1 minor 1;
         disk             /dev/vghome/lvhome;
         meta-disk        internal;
         address          ipv4 192.168.122.102:7789;
     }
     net {
         protocol           C;
         verify-alg       sha1;
         allow-two-primaries yes;
         after-sb-0pri    discard-zero-changes;
         after-sb-1pri    discard-secondary;
         after-sb-2pri    disconnect;
         sndbuf-size      512k;
     }
     disk {
         resync-rate      110M;
         on-io-error      detach;
         fencing          resource-and-stonith;
         al-extents       3389;
     }
     handlers {
         split-brain      "/usr/lib/drbd/notify-split-brain.sh 
******************";
         fence-peer       /usr/lib/drbd/crm-fence-peer.sh;
         after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
     }
}

Eric

Le 23/06/2016 08:47, Eric Bourguinat a écrit :
> Hello,
>
> centos 7.2.1511 - pacemaker 1.1.13 - corosync 2.3.4 - drbd 8.4.7-1 - 
> drbd84-utils 8.9.5
> Linux ftpprod04 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-1
> Linux ftpprod05 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-2
>
> source : 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/
>
> My resources are:
>
> pcs resource
>  Master/Slave Set: HomeDataClone [HomeData]
>      Masters: [ pcmk-1 pcmk-2 ]
>  Clone Set: dlm-clone [dlm]
>      Started: [ pcmk-1 pcmk-2 ]
>  Clone Set: ClusterIP-clone [ClusterIP] (unique)
>      ClusterIP:0    (ocf::heartbeat:IPaddr2):    Started pcmk-1
>      ClusterIP:1    (ocf::heartbeat:IPaddr2):    Started pcmk-2
>  Clone Set: HomeFS-clone [HomeFS]
>      Started: [ pcmk-1 pcmk-2 ]
>  Clone Set: Ftp-clone [Ftp]
>      Started: [ pcmk-1 pcmk-2 ]
>  Clone Set: Sftp-clone [Sftp]
>      Started: [ pcmk-1 pcmk-2 ]
>
> I've a problem when testing failover.
>
> "pkill -9 corosync" on pcmk-2
> - stonith reboot pcmk-2 from pcmk-1
> - a constraint is set
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_perform_op:  ++ /cib/configuration/constraints: <rsc_location 
> rsc="HomeDataClone" id="drbd-fence-by-handler-home-HomeDataClone"/>
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_perform_op:  ++                                    <rule 
> role="Master" score="-INFINITY" 
> id="drbd-fence-by-handler-home-rule-HomeDataClone">
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_perform_op:  ++ <expression attribute="#uname" operation="ne" 
> value="ftpprod04" id="drbd-fence-by-handler-home-expr-HomeDataClone"/>
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_perform_op:  ++ </rule>
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_perform_op:  ++ </rsc_location>
> Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
> cib_process_request:     Completed cib_create operation for section 
> constraints: OK (rc=0, origin=pcmk-1/cibadmin/2, version=0.584.0)
> - but
> Jun 22 10:34:36 [1806] ftpprod04    pengine:   notice: 
> LogActions:      Demote  HomeData:0      (Master -> Slave pcmk-1)
> Why pengine demotes my survivor node?
> - the result is that all services of the cluster are stopped
> Stack: corosync
> Current DC: pcmk-1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition 
> with quorum
> 2 nodes and 14 resources configured
>
> Online: [ pcmk-1 ]
> OFFLINE: [ pcmk-2 ]
>
> Full list of resources:
>
>  Master/Slave Set: HomeDataClone [HomeData]
>      Stopped: [ pcmk-1 pcmk-2 ]
>  Clone Set: dlm-clone [dlm]
>      Stopped: [ pcmk-1 pcmk-2 ]
>  Clone Set: ClusterIP-clone [ClusterIP] (unique)
>      ClusterIP:0    (ocf::heartbeat:IPaddr2):    Stopped
>      ClusterIP:1    (ocf::heartbeat:IPaddr2):    Stopped
>  Clone Set: HomeFS-clone [HomeFS]
>      Stopped: [ pcmk-1 pcmk-2 ]
>  Clone Set: Ftp-clone [Ftp]
>      Stopped: [ pcmk-1 pcmk-2 ]
>  Clone Set: Sftp-clone [Sftp]
>      Stopped: [ pcmk-1 pcmk-2 ]
>  fence-pcmk-1    (stonith:fence_ovh):    Stopped
>  fence-pcmk-2    (stonith:fence_ovh):    Stopped
>
> PCSD Status:
>   pcmk-1: Online
>   pcmk-2: Online
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> - if I launch cluster on pcmk-2 drbd resync, the 2 nodes becomes 
> primary, the constraint is removed, all the services are started
>
> My constraints:
> pcs constraint
> Location Constraints:
>   Resource: fence-pcmk-1
>     Enabled on: pcmk-2 (score:INFINITY)
>   Resource: fence-pcmk-2
>     Enabled on: pcmk-1 (score:INFINITY)
> Ordering Constraints:
>   start ClusterIP-clone then start Ftp-clone (kind:Mandatory)
>   start ClusterIP-clone then start Sftp-clone (kind:Mandatory)
>   promote HomeDataClone then start HomeFS-clone (kind:Mandatory)
>   start HomeFS-clone then start Ftp-clone (kind:Mandatory)
>   start HomeFS-clone then start Sftp-clone (kind:Mandatory)
>   start dlm-clone then start HomeFS-clone (kind:Mandatory)
> Colocation Constraints:
>   Ftp-clone with ClusterIP-clone (score:INFINITY)
>   Sftp-clone with ClusterIP-clone (score:INFINITY)
>   HomeFS-clone with HomeDataClone (score:INFINITY) (with-rsc-role:Master)
>   Ftp-clone with HomeFS-clone (score:INFINITY)
>   Sftp-clone with HomeFS-clone (score:INFINITY)
>   HomeFS-clone with dlm-clone (score:INFINITY)
>
> I think that my problem is coming from the demote of my primary 
> survivor (which has drbd datas uptodate).
> Any idea? Thanks.
>
> Eric
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Eric





More information about the Users mailing list