[ClusterLabs] DRBD demote/promote not called - Why? How to fix?
Ken Gaillot
kgaillot at redhat.com
Thu Nov 10 19:37:07 CET 2016
On 11/09/2016 12:27 PM, CART Andreas wrote:
> Hi again
>
>
>
> Sorry for missing the omission of the master role within the colocation
> constraint.
>
> I added it - but unfortunately still no success.
>
>
>
> (In the meantime I added 2 additional filesystem resources on top of the
> NFSServer, but that should not change anything regarding the root
> problem that I miss the demote of DRBDClone.)
>
>
>
> I again started with all resources located at ventsi-clst1 and issued a
> 'pcs resource move DRBD_global_clst' (the resource next collocated next
> to the DRBDClone).
>
>
>
> With that I end up with all primitive resources stopped and the
> DRBDClone resource still being master at ventsi-clst1.
>
> Here is what pacemaker pretends has to be done:
>
> ==================================================================
>
> [root at ventsi-clst2 ~]# crm_simulate -Ls
>
>
>
> Current cluster status:
>
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
>
>
> ipmi-fence-clst1 (stonith:fence_ipmilan): Started
> ventsi-clst2-sync
>
> ipmi-fence-clst2 (stonith:fence_ipmilan): Started
> ventsi-clst1-sync
>
> IPaddrNFS (ocf::heartbeat:IPaddr2): Stopped
>
> NFSServer (ocf::heartbeat:nfsserver): Stopped
>
> Master/Slave Set: DRBDClone [DRBD]
>
> Masters: [ ventsi-clst1-sync ] <=== still not demoted
>
> Slaves: [ ventsi-clst2-sync ]
>
> DRBD_global_clst (ocf::heartbeat:Filesystem): Stopped
>
> NFS_global_clst (ocf::heartbeat:Filesystem): Stopped
>
> BIND_global_clst (ocf::heartbeat:Filesystem): Stopped
>
>
>
> Allocation scores:
>
> native_color: ipmi-fence-clst1 allocation score on ventsi-clst1-sync:
> -INFINITY
>
> native_color: ipmi-fence-clst1 allocation score on ventsi-clst2-sync:
> INFINITY
>
> native_color: ipmi-fence-clst2 allocation score on ventsi-clst1-sync:
> INFINITY
>
> native_color: ipmi-fence-clst2 allocation score on ventsi-clst2-sync:
> -INFINITY
>
> clone_color: DRBDClone allocation score on ventsi-clst1-sync: 0
>
> clone_color: DRBDClone allocation score on ventsi-clst2-sync: 0
>
> clone_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
>
> clone_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
>
> clone_color: DRBD:1 allocation score on ventsi-clst1-sync: 0
>
> clone_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
>
> native_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
>
> native_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
>
> native_color: DRBD:1 allocation score on ventsi-clst1-sync: -INFINITY
>
> native_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
>
> DRBD:1 promotion score on ventsi-clst2-sync: 10000
>
> DRBD:0 promotion score on ventsi-clst1-sync: 1
>
> native_color: DRBD_global_clst allocation score on ventsi-clst1-sync:
> -INFINITY
>
> native_color: DRBD_global_clst allocation score on ventsi-clst2-sync:
> INFINITY
>
> native_color: IPaddrNFS allocation score on ventsi-clst1-sync: -INFINITY
>
> native_color: IPaddrNFS allocation score on ventsi-clst2-sync: 0
>
> native_color: NFSServer allocation score on ventsi-clst1-sync: -INFINITY
>
> native_color: NFSServer allocation score on ventsi-clst2-sync: 0
>
> native_color: NFS_global_clst allocation score on ventsi-clst1-sync: 0
>
> native_color: NFS_global_clst allocation score on ventsi-clst2-sync:
> -INFINITY
>
> native_color: BIND_global_clst allocation score on ventsi-clst1-sync:
> -INFINITY
>
> native_color: BIND_global_clst allocation score on ventsi-clst2-sync: 0
>
>
>
> Transition Summary:
>
> * Start IPaddrNFS (ventsi-clst2-sync)
>
> * Start NFSServer (ventsi-clst2-sync)
>
> * Demote DRBD:0 (Master -> Slave ventsi-clst1-sync) <=== this
> demote never happens
>
> * Promote DRBD:1 (Slave -> Master ventsi-clst2-sync)
>
> * Start DRBD_global_clst (ventsi-clst2-sync)
>
> * Start NFS_global_clst (ventsi-clst1-sync)
>
> * Start BIND_global_clst (ventsi-clst2-sync)
Strangely, this sequence appears to be ignoring the constraint "start
DRBD_global_clst then start IPaddrNFS".
Can you open a bug report at http://bugs.clusterlabs.org/ and attach the
CIB (or pe-input file) in use at this time?
For testing purposes, you may want to try replacing the "start
DRBD_global_clst then start IPaddrNFS" constraint with "promote
DRBDClone then start IPaddrNFS" to see whether that makes a difference.
> And this is the executed transaction:
>
> ==================================================================
>
> [root at ventsi-clst2 ~]# crm_simulate --xml-file
> /var/lib/pacemaker/pengine/pe-input-1157.bz2 --save-graph problem5.graph
> --save-dotfile problem5.dot -V --simulate
>
> Using the original execution date of: 2016-11-09 17:54:10Z
>
>
>
> Current cluster status:
>
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
>
>
> ipmi-fence-clst1 (stonith:fence_ipmilan): Started
> ventsi-clst2-sync
>
> ipmi-fence-clst2 (stonith:fence_ipmilan): Started
> ventsi-clst1-sync
>
> IPaddrNFS (ocf::heartbeat:IPaddr2): Started ventsi-clst1-sync
>
> NFSServer (ocf::heartbeat:nfsserver): Started ventsi-clst1-sync
>
> Master/Slave Set: DRBDClone [DRBD]
>
> Masters: [ ventsi-clst1-sync ]
>
> Slaves: [ ventsi-clst2-sync ]
>
> DRBD_global_clst (ocf::heartbeat:Filesystem): Started
> ventsi-clst1-sync
>
> NFS_global_clst (ocf::heartbeat:Filesystem): Started
> ventsi-clst2-sync
>
> BIND_global_clst (ocf::heartbeat:Filesystem): Started
> ventsi-clst1-sync
>
>
>
> Transition Summary:
>
> * Stop IPaddrNFS (ventsi-clst1-sync)
>
> * Stop NFSServer (ventsi-clst1-sync)
>
> * Stop DRBD_global_clst (ventsi-clst1-sync)
>
> * Stop NFS_global_clst (Started ventsi-clst2-sync)
>
> * Stop BIND_global_clst (ventsi-clst1-sync)
>
>
>
> Executing cluster transition:
>
> * Resource action: NFS_global_clst stop on ventsi-clst2-sync
>
> * Resource action: BIND_global_clst stop on ventsi-clst1-sync
>
> * Resource action: NFSServer stop on ventsi-clst1-sync
>
> * Resource action: IPaddrNFS stop on ventsi-clst1-sync
>
> * Resource action: DRBD_global_clst stop on ventsi-clst1-sync
>
> * Pseudo action: all_stopped <=== no demote
>
> Using the original execution date of: 2016-11-09 17:54:10Z
>
>
>
> Revised cluster status:
>
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
>
>
> ipmi-fence-clst1 (stonith:fence_ipmilan): Started
> ventsi-clst2-sync
>
> ipmi-fence-clst2 (stonith:fence_ipmilan): Started
> ventsi-clst1-sync
>
> IPaddrNFS (ocf::heartbeat:IPaddr2): Stopped
>
> NFSServer (ocf::heartbeat:nfsserver): Stopped
>
> Master/Slave Set: DRBDClone [DRBD]
>
> Masters: [ ventsi-clst1-sync ]
>
> Slaves: [ ventsi-clst2-sync ]
>
> DRBD_global_clst (ocf::heartbeat:Filesystem): Stopped
>
> NFS_global_clst (ocf::heartbeat:Filesystem): Stopped
>
> BIND_global_clst (ocf::heartbeat:Filesystem): Stopped
>
>
>
> And finally here the updated config:
>
> ==================================================================
>
> [root at ventsi-clst1 ~]# pcs config
>
> Cluster Name: clst1
>
> Corosync Nodes:
>
> ventsi-clst1-sync ventsi-clst2-sync
>
> Pacemaker Nodes:
>
> ventsi-clst1-sync ventsi-clst2-sync
>
>
>
> Resources:
>
> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
>
> Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
>
> Operations: start interval=0 timeout=20 (IPaddrNFS-start-interval-0)
>
> stop interval=0 timeout=20 (IPaddrNFS-stop-interval-0)
>
> monitor interval=10 timeout=20 (IPaddrNFS-monitor-interval-10)
>
> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
>
> Attributes:
> nfs_shared_infodir=/drbdmnts/global_clst/nfsserversettings/
> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
>
> Operations: start interval=0 timeout=40 (NFSServer-start-interval-0)
>
> stop interval=0 timeout=20 (NFSServer-stop-interval-0)
>
> monitor interval=10 timeout=20 (NFSServer-monitor-interval-10)
>
> Master: DRBDClone
>
> Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>
> Resource: DRBD (class=ocf provider=linbit type=drbd)
>
> Attributes: drbd_resource=nfsdata
>
> Operations: start interval=0 timeout=240 (DRBD-start-interval-0)
>
> promote interval=0 timeout=90 (DRBD-promote-interval-0)
>
> demote interval=0 timeout=90 (DRBD-demote-interval-0)
>
> stop interval=0 timeout=100 (DRBD-stop-interval-0)
>
> monitor interval=9 role=Master timeout=5
> (DRBD-monitor-interval-9)
>
> monitor interval=10 role=Slave timeout=5
> (DRBD-monitor-interval-10)
>
> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
>
> Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4
>
> Operations: start interval=0 timeout=60
> (DRBD_global_clst-start-interval-0)
>
> stop interval=0 timeout=60 (DRBD_global_clst-stop-interval-0)
>
> monitor interval=20 timeout=40
> (DRBD_global_clst-monitor-interval-20)
>
> Resource: NFS_global_clst (class=ocf provider=heartbeat type=Filesystem)
>
> Attributes: device=xxx.xxx.xxx.xxx:/drbdmnts/global_clst/nfs
> directory=/global/nfs fstype=nfs
>
> Operations: start interval=0 timeout=60 (NFS_global_clst-start-interval-0)
>
> stop interval=0 timeout=60 (NFS_global_clst-stop-interval-0)
>
> monitor interval=20 timeout=40
> (NFS_global_clst-monitor-interval-20)
>
> Resource: BIND_global_clst (class=ocf provider=heartbeat type=Filesystem)
>
> Attributes: device=/drbdmnts/global_clst/nfs directory=/global/nfs
> fstype=none options=bind
>
> Operations: start interval=0 timeout=60
> (BIND_global_clst-start-interval-0)
>
> stop interval=0 timeout=60 (BIND_global_clst-stop-interval-0)
>
> monitor interval=20 timeout=40
> (BIND_global_clst-monitor-interval-20)
>
>
>
> Stonith Devices:
>
> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
>
> Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
>
> Operations: monitor interval=60 (ipmi-fence-clst1-monitor-interval-60)
>
> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
>
> Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
>
> Operations: monitor interval=60 (ipmi-fence-clst2-monitor-interval-60)
>
> Fencing Levels:
>
>
>
> Location Constraints:
>
> Resource: DRBD_global_clst
>
> Disabled on: ventsi-clst1-sync (score:-INFINITY) (role: Started)
> (id:cli-ban-DRBD_global_clst-on-ventsi-clst1-sync)
>
> Resource: ipmi-fence-clst1
>
> Disabled on: ventsi-clst1-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
>
> Resource: ipmi-fence-clst2
>
> Disabled on: ventsi-clst2-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
>
> Ordering Constraints:
>
> start IPaddrNFS then start NFSServer (kind:Mandatory)
> (id:order-IPaddrNFS-NFSServer-mandatory)
>
> promote DRBDClone then start DRBD_global_clst (kind:Mandatory)
> (id:order-DRBDClone-DRBD_global_clst-mandatory)
>
> start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)
> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
>
> start NFSServer then start NFS_global_clst (kind:Mandatory)
> (id:order-NFSServer-NFS_global_clst-mandatory)
>
> start NFSServer then start BIND_global_clst (kind:Mandatory)
> (id:order-NFSServer-BIND_global_clst-mandatory)
>
> Colocation Constraints:
>
> NFSServer with IPaddrNFS (score:INFINITY)
> (id:colocation-NFSServer-IPaddrNFS-INFINITY)
>
> IPaddrNFS with DRBD_global_clst (score:INFINITY)
> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
>
> NFS_global_clst with NFSServer (score:-INFINITY)
> (id:colocation-NFS_global_clst-NFSServer--INFINITY)
>
> BIND_global_clst with NFSServer (score:INFINITY)
> (id:colocation-BIND_global_clst-NFSServer-INFINITY)
>
> DRBD_global_clst with DRBDClone (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master) (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
>
>
>
> Resources Defaults:
>
> resource-stickiness: INFINITY
>
> Operations Defaults:
>
> timeout: 10s
>
>
>
> Cluster Properties:
>
> cluster-infrastructure: cman
>
> dc-version: 1.1.14-8.el6-70404b0
>
> have-watchdog: false
>
> last-lrm-refresh: 1478703150
>
> no-quorum-policy: ignore
>
> stonith-enabled: true
>
> symmetric-cluster: true
>
> Node Attributes:
>
> ventsi-clst1-sync: PostgresSon-data-status=DISCONNECT
>
> ventsi-clst2-sync: PostgresSon-data-status=DISCONNECT
>
>
>
>
>
> Kind regards
>
> Andi
>
>
>
> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: Dienstag, 8. November 2016 22:29
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] DRBD demote/promote not called - Why? How to fix?
>
>
>
> On 11/04/2016 01:57 PM, CART Andreas wrote:
>
>> Hi
>
>>
>
>> I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 ,
>
>> pcs: 0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.
>
>> This cluster runs NFS on top of DRBD (8.4.4).
>
>>
>
>> Basically the system is working on both nodes and I can switch the
>
>> resources from one node to the other.
>
>> But switching resources to the other node does not work, if I try to
>
>> move just one resource and have the others follow due to the location
>
>> constraints.
>
>>
>
>> From the logged messages I see that in this “failure case” there is NO
>
>> attempt to demote/promote the DRBD clone resource.
>
>>
>
>> Here is my setup:
>
>> ==================================================================
>
>> Cluster Name: clst1
>
>> Corosync Nodes:
>
>> ventsi-clst1-sync ventsi-clst2-sync
>
>> Pacemaker Nodes:
>
>> ventsi-clst1-sync ventsi-clst2-sync
>
>>
>
>> Resources:
>
>> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
>
>> Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
>
>> Operations: start interval=0s timeout=20s (IPaddrNFS-start-interval-0s)
>
>> stop interval=0s timeout=20s (IPaddrNFS-stop-interval-0s)
>
>> monitor interval=5s (IPaddrNFS-monitor-interval-5s)
>
>> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
>
>> Attributes: nfs_shared_infodir=/var/lib/nfsserversettings/
>
>> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
>
>> Operations: start interval=0s timeout=40 (NFSServer-start-interval-0s)
>
>> stop interval=0s timeout=20s (NFSServer-stop-interval-0s)
>
>> monitor interval=10s timeout=20s
>
>> (NFSServer-monitor-interval-10s)
>
>> Master: DRBDClone
>
>> Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>
>> clone-node-max=1 notify=true
>
>> Resource: DRBD (class=ocf provider=linbit type=drbd)
>
>> Attributes: drbd_resource=nfsdata
>
>> Operations: start interval=0s timeout=240 (DRBD-start-interval-0s)
>
>> promote interval=0s timeout=90 (DRBD-promote-interval-0s)
>
>> demote interval=0s timeout=90 (DRBD-demote-interval-0s)
>
>> stop interval=0s timeout=100 (DRBD-stop-interval-0s)
>
>> monitor interval=1s timeout=5 (DRBD-monitor-interval-1s)
>
>> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
>
>> Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst
> fstype=ext4
>
>> Operations: start interval=0s timeout=60
>
>> (DRBD_global_clst-start-interval-0s)
>
>> stop interval=0s timeout=60
>
>> (DRBD_global_clst-stop-interval-0s)
>
>> monitor interval=20 timeout=40
>
>> (DRBD_global_clst-monitor-interval-20)
>
>>
>
>> Stonith Devices:
>
>> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
>
>> Attributes: lanplus=1 login=foo passwd=bar action=reboot
>
>> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list
>
>> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
>
>> Operations: monitor interval=60s (ipmi-fence-clst1-monitor-interval-60s)
>
>> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
>
>> Attributes: lanplus=1 login=foo passwd=bar action=reboot
>
>> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list
>
>> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
>
>> Operations: monitor interval=60s (ipmi-fence-clst2-monitor-interval-60s)
>
>> Fencing Levels:
>
>>
>
>> Location Constraints:
>
>> Resource: ipmi-fence-clst1
>
>> Disabled on: ventsi-clst1-sync (score:-INFINITY)
>
>> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
>
>> Resource: ipmi-fence-clst2
>
>> Disabled on: ventsi-clst2-sync (score:-INFINITY)
>
>> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
>
>> Ordering Constraints:
>
>> start IPaddrNFS then start NFSServer (kind:Mandatory)
>
>> (id:order-IPaddrNFS-NFSServer-mandatory)
>
>> promote DRBDClone then start DRBD_global_clst (kind:Mandatory)
>
>> (id:order-DRBDClone-DRBD_global_clst-mandatory)
>
>> start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)
>
>> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
>
>> Colocation Constraints:
>
>> NFSServer with IPaddrNFS (score:INFINITY)
>
>> (id:colocation-NFSServer-IPaddrNFS-INFINITY)
>
>> DRBD_global_clst with DRBDClone (score:INFINITY)
>
>> (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
>
>
>
> It took me a while to notice it, it's easily overlooked, but the above
>
> constraint is the problem. It says DRBD_global_clst must be located
>
> where DRBDClone is running ... not necessarily where DRBDClone is
>
> master. This constraint should be created like this:
>
>
>
> pcs constraint colocation add DRBD_global_clst with master DBRDClone
>
>
>
>> IPaddrNFS with DRBD_global_clst (score:INFINITY)
>
>> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
>
>>
>
>> Resources Defaults:
>
>> resource-stickiness: INFINITY
>
>> Operations Defaults:
>
>> timeout: 10s
>
>>
>
>> Cluster Properties:
>
>> cluster-infrastructure: cman
>
>> dc-version: 1.1.14-8.el6-70404b0
>
>> have-watchdog: false
>
>> last-lrm-refresh: 1478277432
>
>> no-quorum-policy: ignore
>
>> stonith-enabled: true
>
>> symmetric-cluster: true
>
>> ==================================================================
>
>>
>
>> Initial state is e.g. this (all resources at node1):
>
>>
>
>> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
>>
>
>> Full list of resources:
>
>>
>
>> ipmi-fence-clst1 (stonith:fence_ipmilan): Started
>
>> ventsi-clst2-sync
>
>> ipmi-fence-clst2 (stonith:fence_ipmilan): Started
>
>> ventsi-clst1-sync
>
>> IPaddrNFS (ocf::heartbeat:IPaddr2): Started ventsi-clst1-sync
>
>> NFSServer (ocf::heartbeat:nfsserver): Started ventsi-clst1-sync
>
>> Master/Slave Set: DRBDClone [DRBD]
>
>> Masters: [ ventsi-clst1-sync ]
>
>> Slaves: [ ventsi-clst2-sync ]
>
>> DRBD_global_clst (ocf::heartbeat:Filesystem): Started
>
>> ventsi-clst1-sync
>
>> ==================================================================
>
>>
>
>> If I shutdown the cluster at node 1 (‘pcs cluster stop’) or if I move
>
>> the DRBD clone resource (‘pcs resource move DRBDClone’) all resources
>
>> switch successfully to node2.
>
>> I.e. the demote/promote of the DRBD clone resource is working in these
>
>> cases.
>
>>
>
>> But if I try to move any other resource (e.g. ‘pcs resource move
>
>> NFSServer’) the resources NFSServer, IPaddrNFS and DRBD_global_clst are
>
>> stopped at node 1, but then already follows starting of the
>
>> DRBD_global_clst resource at node2, which fails due to the missing
>
>> demote/promote.
>
>> As far as I can see there is some follow-up attempt to repair things
>
>> partially as the resources are started again at node1 exclusive the
>
>> resource which I moved due to my move command.
>
>>
>
>> Final state is like this:
>
>>
>
>> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
>>
>
>> Full list of resources:
>
>>
>
>> ipmi-fence-clst1 (stonith:fence_ipmilan): Started
>
>> ventsi-clst2-sync
>
>> ipmi-fence-clst2 (stonith:fence_ipmilan): Started
>
>> ventsi-clst1-sync
>
>> IPaddrNFS (ocf::heartbeat:IPaddr2): Started ventsi-clst1-sync
>
>> NFSServer (ocf::heartbeat:nfsserver): Stopped
>
>> Master/Slave Set: DRBDClone [DRBD]
>
>> Masters: [ ventsi-clst1-sync ]
>
>> Slaves: [ ventsi-clst2-sync ]
>
>> DRBD_global_clst (ocf::heartbeat:Filesystem): Started
>
>> ventsi-clst1-sync
>
>>
>
>> Failed Actions:
>
>> * DRBD_global_clst_start_0 on ventsi-clst2-sync 'unknown error' (1):
>
>> call=778, status=complete, exitreason='none',
>
>> last-rc-change='Fri Nov 4 19:32:56 2016', queued=0ms, exec=43ms
>
>> ==================================================================
>
>>
>
>> Here are the logged messages for this “failure case”:
>
>>
>
>> 2016-11-04T19:32:55.163982+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>
>> 2016-11-04T19:32:55.168100+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> On loss of CCM Quorum: Ignore
>
>> 2016-11-04T19:32:55.181252+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Move IPaddrNFS#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
>
>> 2016-11-04T19:32:55.181260+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Move NFSServer#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
>
>> 2016-11-04T19:32:55.181278+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Move DRBD_global_clst#011(Started ventsi-clst1-sync ->
>
>> ventsi-clst2-sync) <=== here no demote/promote is listed
>
>> 2016-11-04T19:32:55.182385+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Calculated Transition 202: /var/lib/pacemaker/pengine/pe-input-766.bz2
>
>> 2016-11-04T19:32:55.182998+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 15: stop NFSServer_stop_0 on ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:55.196265+01:00 ventsi-clst1
>
>> nfsserver(NFSServer)[15978]: INFO: Stopping NFS server ...
>
>> 2016-11-04T19:32:55.249137+01:00 ventsi-clst1 kernel: nfsd: last server
>
>> has exited, flushing export cache
>
>> 2016-11-04T19:32:55.252241+01:00 ventsi-clst1 rpc.mountd[15282]: Caught
>
>> signal 15, un-registering and exiting.
>
>> 2016-11-04T19:32:55.632708+01:00 ventsi-clst1
>
>> nfsserver(NFSServer)[15978]: INFO: Stopping sm-notify
>
>> 2016-11-04T19:32:55.650552+01:00 ventsi-clst1
>
>> nfsserver(NFSServer)[15978]: INFO: Stopping rpc.statd
>
>> 2016-11-04T19:32:55.666777+01:00 ventsi-clst1 rpc.statd[15243]: Caught
>
>> signal 15, un-registering and exiting
>
>> 2016-11-04T19:32:56.692819+01:00 ventsi-clst1
>
>> nfsserver(NFSServer)[15978]: INFO: NFS server stopped
>
>> 2016-11-04T19:32:56.695523+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Operation NFSServer_stop_0: ok (node=ventsi-clst1-sync, call=1220, rc=0,
>
>> cib-update=1695, confirmed=true)
>
>> 2016-11-04T19:32:56.696243+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 12: stop IPaddrNFS_stop_0 on ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:56.727882+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16108]:
>
>> INFO: IP status = ok, IP_CIP=
>
>> 2016-11-04T19:32:56.733383+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Operation IPaddrNFS_stop_0: ok (node=ventsi-clst1-sync, call=1222, rc=0,
>
>> cib-update=1696, confirmed=true)
>
>> 2016-11-04T19:32:56.733917+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 48: stop DRBD_global_clst_stop_0 on ventsi-clst1-sync
>
>> (local)
>
>> 2016-11-04T19:32:56.757181+01:00 ventsi-clst1
>
>> Filesystem(DRBD_global_clst)[16163]: INFO: Running stop for /dev/drbd1
>
>> on /drbdmnts/global_clst
>
>> 2016-11-04T19:32:56.764684+01:00 ventsi-clst1
>
>> Filesystem(DRBD_global_clst)[16163]: INFO: Trying to unmount
>
>> /drbdmnts/global_clst
>
>> 2016-11-04T19:32:56.771260+01:00 ventsi-clst1
>
>> Filesystem(DRBD_global_clst)[16163]: INFO: unmounted
>
>> /drbdmnts/global_clst successfully
>
>> 2016-11-04T19:32:56.776640+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Operation DRBD_global_clst_stop_0: ok (node=ventsi-clst1-sync,
>
>> call=1224, rc=0, cib-update=1697, confirmed=true)
>
>> 2016-11-04T19:32:56.777140+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 49: start DRBD_global_clst_start_0 on
>
>> ventsi-clst2-sync <=== hereis the attempt to start the filesystem at
>
>> the other node, although DRBD has not yet been promoted
>
>> 2016-11-04T19:32:56.840137+01:00 ventsi-clst1 crmd[6116]: warning:
>
>> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
>
>> (target: 0 vs. rc: 1): Error
>
>> 2016-11-04T19:32:56.840158+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Transition aborted by DRBD_global_clst_start_0 'modify' on
>
>> ventsi-clst2-sync: Event failed
>
>> (magic=0:1;49:202:0:b7941532-c74b-40cc-a8ad-27b5502b8fba, cib=0.649.4,
>
>> source=match_graph_event:381, 0)
>
>> 2016-11-04T19:32:56.840232+01:00 ventsi-clst1 crmd[6116]: warning:
>
>> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
>
>> (target: 0 vs. rc: 1): Error
>
>> 2016-11-04T19:32:56.840328+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Transition 202 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=5,
>
>> Source=/var/lib/pacemaker/pengine/pe-input-766.bz2): Complete
>
>> 2016-11-04T19:32:56.843693+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> On loss of CCM Quorum: Ignore
>
>> 2016-11-04T19:32:56.844072+01:00 ventsi-clst1 pengine[6115]: warning:
>
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
>
>> unknown error (1)
>
>> 2016-11-04T19:32:56.844102+01:00 ventsi-clst1 pengine[6115]: warning:
>
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
>
>> unknown error (1)
>
>> 2016-11-04T19:32:56.845071+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Start IPaddrNFS#011(ventsi-clst2-sync)
>
>> 2016-11-04T19:32:56.845078+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Start NFSServer#011(ventsi-clst2-sync)
>
>> 2016-11-04T19:32:56.845081+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Demote DRBD:0#011(Master -> Slave ventsi-clst1-sync) <=== here there
>
>> would be the necessarydemote/promote … but it’s too late; the start of
>
>> the filesystem already failed…
>
>> 2016-11-04T19:32:56.845083+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Promote DRBD:1#011(Slave -> Master ventsi-clst2-sync)
>
>> 2016-11-04T19:32:56.845084+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Recover DRBD_global_clst#011(Started ventsi-clst2-sync)
>
>> 2016-11-04T19:32:56.847986+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Calculated Transition 203: /var/lib/pacemaker/pengine/pe-input-767.bz2
>
>> <=== … so the above transition gets caught by thefollowing attempt to
>
>> repair things partially
>
>> 2016-11-04T19:32:56.867679+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> On loss of CCM Quorum: Ignore
>
>> 2016-11-04T19:32:56.868074+01:00 ventsi-clst1 pengine[6115]: warning:
>
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
>
>> unknown error (1)
>
>> 2016-11-04T19:32:56.868101+01:00 ventsi-clst1 pengine[6115]: warning:
>
>> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
>
>> unknown error (1)
>
>> 2016-11-04T19:32:56.868287+01:00 ventsi-clst1 pengine[6115]: warning:
>
>> Forcing DRBD_global_clst away from ventsi-clst2-sync after 1000000
>
>> failures (max=1000000)
>
>> 2016-11-04T19:32:56.869011+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Start IPaddrNFS#011(ventsi-clst1-sync)
>
>> 2016-11-04T19:32:56.869023+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Recover DRBD_global_clst#011(Started ventsi-clst2-sync ->
> ventsi-clst1-sync)
>
>> 2016-11-04T19:32:56.869770+01:00 ventsi-clst1 pengine[6115]: notice:
>
>> Calculated Transition 204: /var/lib/pacemaker/pengine/pe-input-768.bz2
>
>> 2016-11-04T19:32:56.870065+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 3: stop DRBD_global_clst_stop_0 on ventsi-clst2-sync
>
>> 2016-11-04T19:32:56.908075+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 42: start DRBD_global_clst_start_0 on
>
>> ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:56.931072+01:00 ventsi-clst1
>
>> Filesystem(DRBD_global_clst)[16242]: INFO: Running start for /dev/drbd1
>
>> on /drbdmnts/global_clst
>
>> 2016-11-04T19:32:56.943250+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
>
>> warning: maximal mount count reached, running e2fsck is recommended
>
>> 2016-11-04T19:32:56.953253+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
>
>> mounted filesystem with ordered data mode. Opts:
>
>> 2016-11-04T19:32:56.964284+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Operation DRBD_global_clst_start_0: ok (node=ventsi-clst1-sync,
>
>> call=1225, rc=0, cib-update=1701, confirmed=true)
>
>> 2016-11-04T19:32:56.965104+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 10: start IPaddrNFS_start_0 on ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:56.965325+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 43: monitor DRBD_global_clst_monitor_20000 on
>
>> ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:56.996235+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
>
>> INFO: Adding inet address xxx.xxx.xxx.xxx/24 with broadcast address
>
>> xxx.xxx.xxx.255 to device bond0
>
>> 2016-11-04T19:32:57.002059+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
>
>> INFO: Bringing device bond0 up
>
>> 2016-11-04T19:32:57.008128+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
>
>> INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
>
>> /var/run/resource-agents/send_arp-xxx.xxx.xxx.xxx bond0 xxx.xxx.xxx.xxx
>
>> auto not_used not_used
>
>> 2016-11-04T19:32:57.020159+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Operation IPaddrNFS_start_0: ok (node=ventsi-clst1-sync, call=1226,
>
>> rc=0, cib-update=1703, confirmed=true)
>
>> 2016-11-04T19:32:57.020901+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Initiating action 11: monitor IPaddrNFS_monitor_5000 on
>
>> ventsi-clst1-sync (local)
>
>> 2016-11-04T19:32:57.052231+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> Transition 204 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>
>> Source=/var/lib/pacemaker/pengine/pe-input-768.bz2): Complete
>
>> 2016-11-04T19:32:57.052251+01:00 ventsi-clst1 crmd[6116]: notice:
>
>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
>> ==================================================================
>
>>
>
>> Any ideas what could be the reason for this behavior?
>
>> And how could this be fixed?
>
>>
>
>>
>
>> (I already found several articles on the internet with the
>
>> recommendation to have two separately configured monitor operations for
>
>> the DRBD resource configured one for the master role and another one for
>
>> the slave role.
>
>> Already tried this to no avail.)
>
>>
>
>> Regards
>
>> Andi
More information about the Users
mailing list