[ClusterLabs] DRBD demote/promote not called - Why? How to fix?

Ken Gaillot kgaillot at redhat.com
Thu Nov 10 18:37:07 UTC 2016


On 11/09/2016 12:27 PM, CART Andreas wrote:
> Hi again
> 
>  
> 
> Sorry for missing the omission of the master role within the colocation
> constraint.
> 
> I  added it  - but unfortunately still no success.
> 
>  
> 
> (In the meantime I added 2 additional filesystem resources on top of the
> NFSServer, but that should not change anything regarding the root
> problem that I miss the demote of DRBDClone.)
> 
>  
> 
> I again started with all resources located at ventsi-clst1 and issued a
> 'pcs resource move DRBD_global_clst' (the resource next collocated next
> to the DRBDClone).
> 
>  
> 
> With that I end up with all primitive resources stopped and the
> DRBDClone resource still being master at ventsi-clst1.
> 
> Here is what pacemaker pretends has to be done:
> 
> ==================================================================
> 
> [root at ventsi-clst2 ~]# crm_simulate -Ls
> 
>  
> 
> Current cluster status:
> 
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
> 
>  
> 
> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> ventsi-clst2-sync
> 
> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> ventsi-clst1-sync
> 
> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped
> 
> NFSServer      (ocf::heartbeat:nfsserver):     Stopped
> 
> Master/Slave Set: DRBDClone [DRBD]
> 
>      Masters: [ ventsi-clst1-sync ]    <=== still not demoted
> 
>      Slaves: [ ventsi-clst2-sync ]
> 
> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped
> 
> NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped
> 
> BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped
> 
>  
> 
> Allocation scores:
> 
> native_color: ipmi-fence-clst1 allocation score on ventsi-clst1-sync:
> -INFINITY
> 
> native_color: ipmi-fence-clst1 allocation score on ventsi-clst2-sync:
> INFINITY
> 
> native_color: ipmi-fence-clst2 allocation score on ventsi-clst1-sync:
> INFINITY
> 
> native_color: ipmi-fence-clst2 allocation score on ventsi-clst2-sync:
> -INFINITY
> 
> clone_color: DRBDClone allocation score on ventsi-clst1-sync: 0
> 
> clone_color: DRBDClone allocation score on ventsi-clst2-sync: 0
> 
> clone_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
> 
> clone_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
> 
> clone_color: DRBD:1 allocation score on ventsi-clst1-sync: 0
> 
> clone_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
> 
> native_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
> 
> native_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
> 
> native_color: DRBD:1 allocation score on ventsi-clst1-sync: -INFINITY
> 
> native_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
> 
> DRBD:1 promotion score on ventsi-clst2-sync: 10000
> 
> DRBD:0 promotion score on ventsi-clst1-sync: 1
> 
> native_color: DRBD_global_clst allocation score on ventsi-clst1-sync:
> -INFINITY
> 
> native_color: DRBD_global_clst allocation score on ventsi-clst2-sync:
> INFINITY
> 
> native_color: IPaddrNFS allocation score on ventsi-clst1-sync: -INFINITY
> 
> native_color: IPaddrNFS allocation score on ventsi-clst2-sync: 0
> 
> native_color: NFSServer allocation score on ventsi-clst1-sync: -INFINITY
> 
> native_color: NFSServer allocation score on ventsi-clst2-sync: 0
> 
> native_color: NFS_global_clst allocation score on ventsi-clst1-sync: 0
> 
> native_color: NFS_global_clst allocation score on ventsi-clst2-sync:
> -INFINITY
> 
> native_color: BIND_global_clst allocation score on ventsi-clst1-sync:
> -INFINITY
> 
> native_color: BIND_global_clst allocation score on ventsi-clst2-sync: 0
> 
>  
> 
> Transition Summary:
> 
> * Start   IPaddrNFS    (ventsi-clst2-sync)
> 
> * Start   NFSServer    (ventsi-clst2-sync)
> 
> * Demote  DRBD:0       (Master -> Slave ventsi-clst1-sync)    <=== this
> demote never happens
> 
> * Promote DRBD:1       (Slave -> Master ventsi-clst2-sync)
> 
> * Start   DRBD_global_clst     (ventsi-clst2-sync)
> 
> * Start   NFS_global_clst      (ventsi-clst1-sync)
> 
> * Start   BIND_global_clst     (ventsi-clst2-sync)

Strangely, this sequence appears to be ignoring the constraint "start
DRBD_global_clst then start IPaddrNFS".

Can you open a bug report at http://bugs.clusterlabs.org/ and attach the
CIB (or pe-input file) in use at this time?

For testing purposes, you may want to try replacing the "start
DRBD_global_clst then start IPaddrNFS" constraint with "promote
DRBDClone then start IPaddrNFS" to see whether that makes a difference.

> And this is the executed transaction:
> 
> ==================================================================
> 
> [root at ventsi-clst2 ~]# crm_simulate --xml-file
> /var/lib/pacemaker/pengine/pe-input-1157.bz2 --save-graph problem5.graph
> --save-dotfile problem5.dot -V --simulate
> 
> Using the original execution date of: 2016-11-09 17:54:10Z
> 
>  
> 
> Current cluster status:
> 
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
> 
>  
> 
> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> ventsi-clst2-sync
> 
> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> ventsi-clst1-sync
> 
> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
> 
> NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync
> 
> Master/Slave Set: DRBDClone [DRBD]
> 
>      Masters: [ ventsi-clst1-sync ]
> 
>      Slaves: [ ventsi-clst2-sync ]
> 
> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started
> ventsi-clst1-sync
> 
> NFS_global_clst        (ocf::heartbeat:Filesystem):    Started
> ventsi-clst2-sync
> 
> BIND_global_clst       (ocf::heartbeat:Filesystem):    Started
> ventsi-clst1-sync
> 
>  
> 
> Transition Summary:
> 
> * Stop    IPaddrNFS    (ventsi-clst1-sync)
> 
> * Stop    NFSServer    (ventsi-clst1-sync)
> 
> * Stop    DRBD_global_clst     (ventsi-clst1-sync)
> 
> * Stop    NFS_global_clst      (Started ventsi-clst2-sync)
> 
> * Stop    BIND_global_clst     (ventsi-clst1-sync)
> 
>  
> 
> Executing cluster transition:
> 
> * Resource action: NFS_global_clst stop on ventsi-clst2-sync
> 
> * Resource action: BIND_global_clst stop on ventsi-clst1-sync
> 
> * Resource action: NFSServer       stop on ventsi-clst1-sync
> 
> * Resource action: IPaddrNFS       stop on ventsi-clst1-sync
> 
> * Resource action: DRBD_global_clst stop on ventsi-clst1-sync
> 
> * Pseudo action:   all_stopped    <=== no demote
> 
> Using the original execution date of: 2016-11-09 17:54:10Z
> 
>  
> 
> Revised cluster status:
> 
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
> 
>  
> 
> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> ventsi-clst2-sync
> 
> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> ventsi-clst1-sync
> 
> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped
> 
> NFSServer      (ocf::heartbeat:nfsserver):     Stopped
> 
> Master/Slave Set: DRBDClone [DRBD]
> 
>      Masters: [ ventsi-clst1-sync ]
> 
>      Slaves: [ ventsi-clst2-sync ]
> 
> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped
> 
> NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped
> 
> BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped
> 
>  
> 
> And finally here the updated config:
> 
> ==================================================================
> 
> [root at ventsi-clst1 ~]# pcs config
> 
> Cluster Name: clst1
> 
> Corosync Nodes:
> 
> ventsi-clst1-sync ventsi-clst2-sync
> 
> Pacemaker Nodes:
> 
> ventsi-clst1-sync ventsi-clst2-sync
> 
>  
> 
> Resources:
> 
> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
> 
>   Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
> 
>   Operations: start interval=0 timeout=20 (IPaddrNFS-start-interval-0)
> 
>               stop interval=0 timeout=20 (IPaddrNFS-stop-interval-0)
> 
>               monitor interval=10 timeout=20 (IPaddrNFS-monitor-interval-10)
> 
> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
> 
>   Attributes:
> nfs_shared_infodir=/drbdmnts/global_clst/nfsserversettings/
> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
> 
>   Operations: start interval=0 timeout=40 (NFSServer-start-interval-0)
> 
>               stop interval=0 timeout=20 (NFSServer-stop-interval-0)
> 
>               monitor interval=10 timeout=20 (NFSServer-monitor-interval-10)
> 
> Master: DRBDClone
> 
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
> 
>   Resource: DRBD (class=ocf provider=linbit type=drbd)
> 
>    Attributes: drbd_resource=nfsdata
> 
>    Operations: start interval=0 timeout=240 (DRBD-start-interval-0)
> 
>                promote interval=0 timeout=90 (DRBD-promote-interval-0)
> 
>                demote interval=0 timeout=90 (DRBD-demote-interval-0)
> 
>                stop interval=0 timeout=100 (DRBD-stop-interval-0)
> 
>                monitor interval=9 role=Master timeout=5
> (DRBD-monitor-interval-9)
> 
>                monitor interval=10 role=Slave timeout=5
> (DRBD-monitor-interval-10)
> 
> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
> 
>   Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4
> 
>   Operations: start interval=0 timeout=60
> (DRBD_global_clst-start-interval-0)
> 
>               stop interval=0 timeout=60 (DRBD_global_clst-stop-interval-0)
> 
>               monitor interval=20 timeout=40
> (DRBD_global_clst-monitor-interval-20)
> 
> Resource: NFS_global_clst (class=ocf provider=heartbeat type=Filesystem)
> 
>   Attributes: device=xxx.xxx.xxx.xxx:/drbdmnts/global_clst/nfs
> directory=/global/nfs fstype=nfs
> 
>   Operations: start interval=0 timeout=60 (NFS_global_clst-start-interval-0)
> 
>               stop interval=0 timeout=60 (NFS_global_clst-stop-interval-0)
> 
>               monitor interval=20 timeout=40
> (NFS_global_clst-monitor-interval-20)
> 
> Resource: BIND_global_clst (class=ocf provider=heartbeat type=Filesystem)
> 
>   Attributes: device=/drbdmnts/global_clst/nfs directory=/global/nfs
> fstype=none options=bind
> 
>   Operations: start interval=0 timeout=60
> (BIND_global_clst-start-interval-0)
> 
>               stop interval=0 timeout=60 (BIND_global_clst-stop-interval-0)
> 
>               monitor interval=20 timeout=40
> (BIND_global_clst-monitor-interval-20)
> 
>  
> 
> Stonith Devices:
> 
> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
> 
>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
> 
>   Operations: monitor interval=60 (ipmi-fence-clst1-monitor-interval-60)
> 
> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
> 
>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
> 
>   Operations: monitor interval=60 (ipmi-fence-clst2-monitor-interval-60)
> 
> Fencing Levels:
> 
>  
> 
> Location Constraints:
> 
>   Resource: DRBD_global_clst
> 
>     Disabled on: ventsi-clst1-sync (score:-INFINITY) (role: Started)
> (id:cli-ban-DRBD_global_clst-on-ventsi-clst1-sync)
> 
>   Resource: ipmi-fence-clst1
> 
>     Disabled on: ventsi-clst1-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
> 
>   Resource: ipmi-fence-clst2
> 
>     Disabled on: ventsi-clst2-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
> 
> Ordering Constraints:
> 
>   start IPaddrNFS then start NFSServer (kind:Mandatory)
> (id:order-IPaddrNFS-NFSServer-mandatory)
> 
>   promote DRBDClone then start DRBD_global_clst (kind:Mandatory)
> (id:order-DRBDClone-DRBD_global_clst-mandatory)
> 
>   start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)
> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
> 
>   start NFSServer then start NFS_global_clst (kind:Mandatory)
> (id:order-NFSServer-NFS_global_clst-mandatory)
> 
>   start NFSServer then start BIND_global_clst (kind:Mandatory)
> (id:order-NFSServer-BIND_global_clst-mandatory)
> 
> Colocation Constraints:
> 
>   NFSServer with IPaddrNFS (score:INFINITY)
> (id:colocation-NFSServer-IPaddrNFS-INFINITY)
> 
>   IPaddrNFS with DRBD_global_clst (score:INFINITY)
> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
> 
>   NFS_global_clst with NFSServer (score:-INFINITY)
> (id:colocation-NFS_global_clst-NFSServer--INFINITY)
> 
>   BIND_global_clst with NFSServer (score:INFINITY)
> (id:colocation-BIND_global_clst-NFSServer-INFINITY)
> 
>   DRBD_global_clst with DRBDClone (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master) (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
> 
>  
> 
> Resources Defaults:
> 
> resource-stickiness: INFINITY
> 
> Operations Defaults:
> 
> timeout: 10s
> 
>  
> 
> Cluster Properties:
> 
> cluster-infrastructure: cman
> 
> dc-version: 1.1.14-8.el6-70404b0
> 
> have-watchdog: false
> 
> last-lrm-refresh: 1478703150
> 
> no-quorum-policy: ignore
> 
> stonith-enabled: true
> 
> symmetric-cluster: true
> 
> Node Attributes:
> 
> ventsi-clst1-sync: PostgresSon-data-status=DISCONNECT
> 
> ventsi-clst2-sync: PostgresSon-data-status=DISCONNECT
> 
>  
> 
>  
> 
> Kind regards
> 
> Andi
> 
>  
> 
> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: Dienstag, 8. November 2016 22:29
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] DRBD demote/promote not called - Why? How to fix?
> 
>  
> 
> On 11/04/2016 01:57 PM, CART Andreas wrote:
> 
>> Hi
> 
>> 
> 
>> I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 ,
> 
>> pcs: 0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.
> 
>> This cluster runs NFS on top of DRBD (8.4.4).
> 
>> 
> 
>> Basically the system is working on both nodes and I can switch the
> 
>> resources from one node to the other.
> 
>> But switching resources to the other node does not work, if I try to
> 
>> move just one resource and have the others follow due to the location
> 
>> constraints.
> 
>> 
> 
>> From the logged messages I see that in this “failure case” there is NO
> 
>> attempt to demote/promote the DRBD clone resource.
> 
>> 
> 
>> Here is my setup:
> 
>> ==================================================================
> 
>> Cluster Name: clst1
> 
>> Corosync Nodes:
> 
>> ventsi-clst1-sync ventsi-clst2-sync
> 
>> Pacemaker Nodes:
> 
>> ventsi-clst1-sync ventsi-clst2-sync
> 
>> 
> 
>> Resources:
> 
>> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
> 
>>   Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
> 
>>   Operations: start interval=0s timeout=20s (IPaddrNFS-start-interval-0s)
> 
>>               stop interval=0s timeout=20s (IPaddrNFS-stop-interval-0s)
> 
>>               monitor interval=5s (IPaddrNFS-monitor-interval-5s)
> 
>> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
> 
>>   Attributes: nfs_shared_infodir=/var/lib/nfsserversettings/
> 
>> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
> 
>>   Operations: start interval=0s timeout=40 (NFSServer-start-interval-0s)
> 
>>               stop interval=0s timeout=20s (NFSServer-stop-interval-0s)
> 
>>               monitor interval=10s timeout=20s
> 
>> (NFSServer-monitor-interval-10s)
> 
>> Master: DRBDClone
> 
>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> 
>> clone-node-max=1 notify=true
> 
>>   Resource: DRBD (class=ocf provider=linbit type=drbd)
> 
>>    Attributes: drbd_resource=nfsdata
> 
>>    Operations: start interval=0s timeout=240 (DRBD-start-interval-0s)
> 
>>                promote interval=0s timeout=90 (DRBD-promote-interval-0s)
> 
>>                demote interval=0s timeout=90 (DRBD-demote-interval-0s)
> 
>>                stop interval=0s timeout=100 (DRBD-stop-interval-0s)
> 
>>                monitor interval=1s timeout=5 (DRBD-monitor-interval-1s)
> 
>> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
> 
>>   Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst
> fstype=ext4
> 
>>   Operations: start interval=0s timeout=60
> 
>> (DRBD_global_clst-start-interval-0s)
> 
>>               stop interval=0s timeout=60
> 
>> (DRBD_global_clst-stop-interval-0s)
> 
>>               monitor interval=20 timeout=40
> 
>> (DRBD_global_clst-monitor-interval-20)
> 
>> 
> 
>> Stonith Devices:
> 
>> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
> 
>>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> 
>> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list
> 
>> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
> 
>>   Operations: monitor interval=60s (ipmi-fence-clst1-monitor-interval-60s)
> 
>> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
> 
>>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> 
>> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list
> 
>> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
> 
>>   Operations: monitor interval=60s (ipmi-fence-clst2-monitor-interval-60s)
> 
>> Fencing Levels:
> 
>> 
> 
>> Location Constraints:
> 
>>   Resource: ipmi-fence-clst1
> 
>>     Disabled on: ventsi-clst1-sync (score:-INFINITY)
> 
>> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
> 
>>   Resource: ipmi-fence-clst2
> 
>>     Disabled on: ventsi-clst2-sync (score:-INFINITY)
> 
>> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
> 
>> Ordering Constraints:
> 
>>   start IPaddrNFS then start NFSServer (kind:Mandatory)
> 
>> (id:order-IPaddrNFS-NFSServer-mandatory)
> 
>>   promote DRBDClone then start DRBD_global_clst (kind:Mandatory)
> 
>> (id:order-DRBDClone-DRBD_global_clst-mandatory)
> 
>>   start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)
> 
>> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
> 
>> Colocation Constraints:
> 
>>   NFSServer with IPaddrNFS (score:INFINITY)
> 
>> (id:colocation-NFSServer-IPaddrNFS-INFINITY)
> 
>>   DRBD_global_clst with DRBDClone (score:INFINITY)
> 
>> (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
> 
>  
> 
> It took me a while to notice it, it's easily overlooked, but the above
> 
> constraint is the problem. It says DRBD_global_clst must be located
> 
> where DRBDClone is running ... not necessarily where DRBDClone is
> 
> master. This constraint should be created like this:
> 
>  
> 
> pcs constraint colocation add DRBD_global_clst with master DBRDClone
> 
>  
> 
>>   IPaddrNFS with DRBD_global_clst (score:INFINITY)
> 
>> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
> 
>> 
> 
>> Resources Defaults:
> 
>> resource-stickiness: INFINITY
> 
>> Operations Defaults:
> 
>> timeout: 10s
> 
>> 
> 
>> Cluster Properties:
> 
>> cluster-infrastructure: cman
> 
>> dc-version: 1.1.14-8.el6-70404b0
> 
>> have-watchdog: false
> 
>> last-lrm-refresh: 1478277432
> 
>> no-quorum-policy: ignore
> 
>> stonith-enabled: true
> 
>> symmetric-cluster: true
> 
>> ==================================================================
> 
>> 
> 
>> Initial state is e.g. this (all resources at node1):
> 
>> 
> 
>> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
> 
>> 
> 
>> Full list of resources:
> 
>> 
> 
>> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> 
>> ventsi-clst2-sync
> 
>> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> 
>> ventsi-clst1-sync
> 
>> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
> 
>> NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync
> 
>> Master/Slave Set: DRBDClone [DRBD]
> 
>>      Masters: [ ventsi-clst1-sync ]
> 
>>      Slaves: [ ventsi-clst2-sync ]
> 
>> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started
> 
>> ventsi-clst1-sync
> 
>> ==================================================================
> 
>> 
> 
>> If I shutdown the cluster at node 1 (‘pcs cluster stop’) or if I move
> 
>> the DRBD clone resource (‘pcs resource move DRBDClone’) all resources
> 
>> switch successfully to node2.
> 
>> I.e. the demote/promote of the DRBD clone resource is working in these
> 
>> cases.
> 
>> 
> 
>> But if I try to move any other resource (e.g. ‘pcs resource move
> 
>> NFSServer’) the resources NFSServer, IPaddrNFS and DRBD_global_clst are
> 
>> stopped at node 1, but then already follows starting of the
> 
>> DRBD_global_clst resource at node2, which fails due to the missing
> 
>> demote/promote.
> 
>> As far as I can see there is some follow-up attempt to repair things
> 
>> partially as the resources are started again at node1 exclusive the
> 
>> resource which I moved due to my move command.
> 
>> 
> 
>> Final state is like this:
> 
>> 
> 
>> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
> 
>> 
> 
>> Full list of resources:
> 
>> 
> 
>> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> 
>> ventsi-clst2-sync
> 
>> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> 
>> ventsi-clst1-sync
> 
>> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
> 
>> NFSServer      (ocf::heartbeat:nfsserver):     Stopped
> 
>> Master/Slave Set: DRBDClone [DRBD]
> 
>>      Masters: [ ventsi-clst1-sync ]
> 
>>      Slaves: [ ventsi-clst2-sync ]
> 
>> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started
> 
>> ventsi-clst1-sync
> 
>> 
> 
>> Failed Actions:
> 
>> * DRBD_global_clst_start_0 on ventsi-clst2-sync 'unknown error' (1):
> 
>> call=778, status=complete, exitreason='none',
> 
>>     last-rc-change='Fri Nov  4 19:32:56 2016', queued=0ms, exec=43ms
> 
>> ==================================================================
> 
>> 
> 
>> Here are the logged messages for this “failure case”:
> 
>> 
> 
>> 2016-11-04T19:32:55.163982+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> 
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> 
>> 2016-11-04T19:32:55.168100+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> On loss of CCM Quorum: Ignore
> 
>> 2016-11-04T19:32:55.181252+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Move    IPaddrNFS#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:55.181260+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Move    NFSServer#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:55.181278+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Move    DRBD_global_clst#011(Started ventsi-clst1-sync ->
> 
>> ventsi-clst2-sync)  <=== here no demote/promote is listed
> 
>> 2016-11-04T19:32:55.182385+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Calculated Transition 202: /var/lib/pacemaker/pengine/pe-input-766.bz2
> 
>> 2016-11-04T19:32:55.182998+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 15: stop NFSServer_stop_0 on ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:55.196265+01:00 ventsi-clst1
> 
>> nfsserver(NFSServer)[15978]: INFO: Stopping NFS server ...
> 
>> 2016-11-04T19:32:55.249137+01:00 ventsi-clst1 kernel: nfsd: last server
> 
>> has exited, flushing export cache
> 
>> 2016-11-04T19:32:55.252241+01:00 ventsi-clst1 rpc.mountd[15282]: Caught
> 
>> signal 15, un-registering and exiting.
> 
>> 2016-11-04T19:32:55.632708+01:00 ventsi-clst1
> 
>> nfsserver(NFSServer)[15978]: INFO: Stopping sm-notify
> 
>> 2016-11-04T19:32:55.650552+01:00 ventsi-clst1
> 
>> nfsserver(NFSServer)[15978]: INFO: Stopping rpc.statd
> 
>> 2016-11-04T19:32:55.666777+01:00 ventsi-clst1 rpc.statd[15243]: Caught
> 
>> signal 15, un-registering and exiting
> 
>> 2016-11-04T19:32:56.692819+01:00 ventsi-clst1
> 
>> nfsserver(NFSServer)[15978]: INFO: NFS server stopped
> 
>> 2016-11-04T19:32:56.695523+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Operation NFSServer_stop_0: ok (node=ventsi-clst1-sync, call=1220, rc=0,
> 
>> cib-update=1695, confirmed=true)
> 
>> 2016-11-04T19:32:56.696243+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 12: stop IPaddrNFS_stop_0 on ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:56.727882+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16108]:
> 
>> INFO: IP status = ok, IP_CIP=
> 
>> 2016-11-04T19:32:56.733383+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Operation IPaddrNFS_stop_0: ok (node=ventsi-clst1-sync, call=1222, rc=0,
> 
>> cib-update=1696, confirmed=true)
> 
>> 2016-11-04T19:32:56.733917+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 48: stop DRBD_global_clst_stop_0 on ventsi-clst1-sync
> 
>> (local)
> 
>> 2016-11-04T19:32:56.757181+01:00 ventsi-clst1
> 
>> Filesystem(DRBD_global_clst)[16163]: INFO: Running stop for /dev/drbd1
> 
>> on /drbdmnts/global_clst
> 
>> 2016-11-04T19:32:56.764684+01:00 ventsi-clst1
> 
>> Filesystem(DRBD_global_clst)[16163]: INFO: Trying to unmount
> 
>> /drbdmnts/global_clst
> 
>> 2016-11-04T19:32:56.771260+01:00 ventsi-clst1
> 
>> Filesystem(DRBD_global_clst)[16163]: INFO: unmounted
> 
>> /drbdmnts/global_clst successfully
> 
>> 2016-11-04T19:32:56.776640+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Operation DRBD_global_clst_stop_0: ok (node=ventsi-clst1-sync,
> 
>> call=1224, rc=0, cib-update=1697, confirmed=true)
> 
>> 2016-11-04T19:32:56.777140+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 49: start DRBD_global_clst_start_0 on
> 
>> ventsi-clst2-sync   <=== hereis the attempt to start the filesystem at
> 
>> the other node, although DRBD has not yet been promoted
> 
>> 2016-11-04T19:32:56.840137+01:00 ventsi-clst1 crmd[6116]:  warning:
> 
>> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
> 
>> (target: 0 vs. rc: 1): Error
> 
>> 2016-11-04T19:32:56.840158+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Transition aborted by DRBD_global_clst_start_0 'modify' on
> 
>> ventsi-clst2-sync: Event failed
> 
>> (magic=0:1;49:202:0:b7941532-c74b-40cc-a8ad-27b5502b8fba, cib=0.649.4,
> 
>> source=match_graph_event:381, 0)
> 
>> 2016-11-04T19:32:56.840232+01:00 ventsi-clst1 crmd[6116]:  warning:
> 
>> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
> 
>> (target: 0 vs. rc: 1): Error
> 
>> 2016-11-04T19:32:56.840328+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Transition 202 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=5,
> 
>> Source=/var/lib/pacemaker/pengine/pe-input-766.bz2): Complete
> 
>> 2016-11-04T19:32:56.843693+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> On loss of CCM Quorum: Ignore
> 
>> 2016-11-04T19:32:56.844072+01:00 ventsi-clst1 pengine[6115]:  warning:
> 
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> 
>> unknown error (1)
> 
>> 2016-11-04T19:32:56.844102+01:00 ventsi-clst1 pengine[6115]:  warning:
> 
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> 
>> unknown error (1)
> 
>> 2016-11-04T19:32:56.845071+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Start   IPaddrNFS#011(ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:56.845078+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Start   NFSServer#011(ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:56.845081+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Demote  DRBD:0#011(Master -> Slave ventsi-clst1-sync)   <=== here there
> 
>> would be the necessarydemote/promote … but it’s too late; the start of
> 
>> the filesystem already failed…
> 
>> 2016-11-04T19:32:56.845083+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Promote DRBD:1#011(Slave -> Master ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:56.845084+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Recover DRBD_global_clst#011(Started ventsi-clst2-sync)
> 
>> 2016-11-04T19:32:56.847986+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Calculated Transition 203: /var/lib/pacemaker/pengine/pe-input-767.bz2 
> 
>> <=== … so the above transition gets caught by thefollowing attempt to
> 
>> repair things partially
> 
>> 2016-11-04T19:32:56.867679+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> On loss of CCM Quorum: Ignore
> 
>> 2016-11-04T19:32:56.868074+01:00 ventsi-clst1 pengine[6115]:  warning:
> 
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> 
>> unknown error (1)
> 
>> 2016-11-04T19:32:56.868101+01:00 ventsi-clst1 pengine[6115]:  warning:
> 
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> 
>> unknown error (1)
> 
>> 2016-11-04T19:32:56.868287+01:00 ventsi-clst1 pengine[6115]:  warning:
> 
>> Forcing DRBD_global_clst away from ventsi-clst2-sync after 1000000
> 
>> failures (max=1000000)
> 
>> 2016-11-04T19:32:56.869011+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Start   IPaddrNFS#011(ventsi-clst1-sync)
> 
>> 2016-11-04T19:32:56.869023+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Recover DRBD_global_clst#011(Started ventsi-clst2-sync ->
> ventsi-clst1-sync)
> 
>> 2016-11-04T19:32:56.869770+01:00 ventsi-clst1 pengine[6115]:   notice:
> 
>> Calculated Transition 204: /var/lib/pacemaker/pengine/pe-input-768.bz2
> 
>> 2016-11-04T19:32:56.870065+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 3: stop DRBD_global_clst_stop_0 on ventsi-clst2-sync
> 
>> 2016-11-04T19:32:56.908075+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 42: start DRBD_global_clst_start_0 on
> 
>> ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:56.931072+01:00 ventsi-clst1
> 
>> Filesystem(DRBD_global_clst)[16242]: INFO: Running start for /dev/drbd1
> 
>> on /drbdmnts/global_clst
> 
>> 2016-11-04T19:32:56.943250+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
> 
>> warning: maximal mount count reached, running e2fsck is recommended
> 
>> 2016-11-04T19:32:56.953253+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
> 
>> mounted filesystem with ordered data mode. Opts:
> 
>> 2016-11-04T19:32:56.964284+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Operation DRBD_global_clst_start_0: ok (node=ventsi-clst1-sync,
> 
>> call=1225, rc=0, cib-update=1701, confirmed=true)
> 
>> 2016-11-04T19:32:56.965104+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 10: start IPaddrNFS_start_0 on ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:56.965325+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 43: monitor DRBD_global_clst_monitor_20000 on
> 
>> ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:56.996235+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> 
>> INFO: Adding inet address xxx.xxx.xxx.xxx/24 with broadcast address
> 
>> xxx.xxx.xxx.255 to device bond0
> 
>> 2016-11-04T19:32:57.002059+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> 
>> INFO: Bringing device bond0 up
> 
>> 2016-11-04T19:32:57.008128+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> 
>> INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
> 
>> /var/run/resource-agents/send_arp-xxx.xxx.xxx.xxx bond0 xxx.xxx.xxx.xxx
> 
>> auto not_used not_used
> 
>> 2016-11-04T19:32:57.020159+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Operation IPaddrNFS_start_0: ok (node=ventsi-clst1-sync, call=1226,
> 
>> rc=0, cib-update=1703, confirmed=true)
> 
>> 2016-11-04T19:32:57.020901+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Initiating action 11: monitor IPaddrNFS_monitor_5000 on
> 
>> ventsi-clst1-sync (local)
> 
>> 2016-11-04T19:32:57.052231+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> Transition 204 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> 
>> Source=/var/lib/pacemaker/pengine/pe-input-768.bz2): Complete
> 
>> 2016-11-04T19:32:57.052251+01:00 ventsi-clst1 crmd[6116]:   notice:
> 
>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> 
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
> 
>> ==================================================================
> 
>> 
> 
>> Any ideas what could be the reason for this behavior?
> 
>> And how could this be fixed?
> 
>> 
> 
>> 
> 
>> (I already found several articles on the internet with the
> 
>> recommendation to have two separately configured monitor operations for
> 
>> the DRBD resource configured one for the master role and another one for
> 
>> the slave role.
> 
>> Already tried this to no avail.)
> 
>> 
> 
>> Regards
> 
>> Andi




More information about the Users mailing list