[ClusterLabs] Failover problem with dual primary drbd

Eric Bourguinat eric.bourguinat at steady-sun.com
Thu Jun 23 02:47:47 EDT 2016


Hello,

centos 7.2.1511 - pacemaker 1.1.13 - corosync 2.3.4 - drbd 8.4.7-1 - 
drbd84-utils 8.9.5
Linux ftpprod04 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-1
Linux ftpprod05 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux => pcmk-2

source : 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/

My resources are:

pcs resource
  Master/Slave Set: HomeDataClone [HomeData]
      Masters: [ pcmk-1 pcmk-2 ]
  Clone Set: dlm-clone [dlm]
      Started: [ pcmk-1 pcmk-2 ]
  Clone Set: ClusterIP-clone [ClusterIP] (unique)
      ClusterIP:0    (ocf::heartbeat:IPaddr2):    Started pcmk-1
      ClusterIP:1    (ocf::heartbeat:IPaddr2):    Started pcmk-2
  Clone Set: HomeFS-clone [HomeFS]
      Started: [ pcmk-1 pcmk-2 ]
  Clone Set: Ftp-clone [Ftp]
      Started: [ pcmk-1 pcmk-2 ]
  Clone Set: Sftp-clone [Sftp]
      Started: [ pcmk-1 pcmk-2 ]

I've a problem when testing failover.

"pkill -9 corosync" on pcmk-2
- stonith reboot pcmk-2 from pcmk-1
- a constraint is set
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: cib_perform_op:  
++ /cib/configuration/constraints: <rsc_location rsc="HomeDataClone" 
id="drbd-fence-by-handler-home-HomeDataClone"/>
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: cib_perform_op:  
++                                    <rule role="Master" 
score="-INFINITY" id="drbd-fence-by-handler-home-rule-HomeDataClone">
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: cib_perform_op:  
++ <expression attribute="#uname" operation="ne" value="ftpprod04" 
id="drbd-fence-by-handler-home-expr-HomeDataClone"/>
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: cib_perform_op:  
++                                    </rule>
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: cib_perform_op:  
++ </rsc_location>
Jun 22 10:34:36 [1802] ftpprod04        cib:     info: 
cib_process_request:     Completed cib_create operation for section 
constraints: OK (rc=0, origin=pcmk-1/cibadmin/2, version=0.584.0)
- but
Jun 22 10:34:36 [1806] ftpprod04    pengine:   notice: LogActions:      
Demote  HomeData:0      (Master -> Slave pcmk-1)
Why pengine demotes my survivor node?
- the result is that all services of the cluster are stopped
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with 
quorum
2 nodes and 14 resources configured

Online: [ pcmk-1 ]
OFFLINE: [ pcmk-2 ]

Full list of resources:

  Master/Slave Set: HomeDataClone [HomeData]
      Stopped: [ pcmk-1 pcmk-2 ]
  Clone Set: dlm-clone [dlm]
      Stopped: [ pcmk-1 pcmk-2 ]
  Clone Set: ClusterIP-clone [ClusterIP] (unique)
      ClusterIP:0    (ocf::heartbeat:IPaddr2):    Stopped
      ClusterIP:1    (ocf::heartbeat:IPaddr2):    Stopped
  Clone Set: HomeFS-clone [HomeFS]
      Stopped: [ pcmk-1 pcmk-2 ]
  Clone Set: Ftp-clone [Ftp]
      Stopped: [ pcmk-1 pcmk-2 ]
  Clone Set: Sftp-clone [Sftp]
      Stopped: [ pcmk-1 pcmk-2 ]
  fence-pcmk-1    (stonith:fence_ovh):    Stopped
  fence-pcmk-2    (stonith:fence_ovh):    Stopped

PCSD Status:
   pcmk-1: Online
   pcmk-2: Online

Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled
- if I launch cluster on pcmk-2 drbd resync, the 2 nodes becomes 
primary, the constraint is removed, all the services are started

My constraints:
pcs constraint
Location Constraints:
   Resource: fence-pcmk-1
     Enabled on: pcmk-2 (score:INFINITY)
   Resource: fence-pcmk-2
     Enabled on: pcmk-1 (score:INFINITY)
Ordering Constraints:
   start ClusterIP-clone then start Ftp-clone (kind:Mandatory)
   start ClusterIP-clone then start Sftp-clone (kind:Mandatory)
   promote HomeDataClone then start HomeFS-clone (kind:Mandatory)
   start HomeFS-clone then start Ftp-clone (kind:Mandatory)
   start HomeFS-clone then start Sftp-clone (kind:Mandatory)
   start dlm-clone then start HomeFS-clone (kind:Mandatory)
Colocation Constraints:
   Ftp-clone with ClusterIP-clone (score:INFINITY)
   Sftp-clone with ClusterIP-clone (score:INFINITY)
   HomeFS-clone with HomeDataClone (score:INFINITY) (with-rsc-role:Master)
   Ftp-clone with HomeFS-clone (score:INFINITY)
   Sftp-clone with HomeFS-clone (score:INFINITY)
   HomeFS-clone with dlm-clone (score:INFINITY)

I think that my problem is coming from the demote of my primary survivor 
(which has drbd datas uptodate).
Any idea? Thanks.

Eric




More information about the Users mailing list