[ClusterLabs] DRBD demote/promote not called - Why? How to fix?

Wed Nov 9 18:27:39 UTC 2016

Hi again

Sorry for missing the omission of the master role within the colocation constraint.

I  added it  - but unfortunately still no success.

(In the meantime I added 2 additional filesystem resources on top of the NFSServer, but that should not change anything regarding the root problem that I miss the demote of DRBDClone.)

I again started with all resources located at ventsi-clst1 and issued a 'pcs resource move DRBD_global_clst' (the resource next collocated next to the DRBDClone).

With that I end up with all primitive resources stopped and the DRBDClone resource still being master at ventsi-clst1.

Here is what pacemaker pretends has to be done:
==================================================================

[root at ventsi-clst2 ~]# crm_simulate -Ls

Current cluster status:

Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync

ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync

IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped

NFSServer      (ocf::heartbeat:nfsserver):     Stopped

Master/Slave Set: DRBDClone [DRBD]

     Masters: [ ventsi-clst1-sync ]    <=== still not demoted

     Slaves: [ ventsi-clst2-sync ]

DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped

NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped

BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped

Allocation scores:

native_color: ipmi-fence-clst1 allocation score on ventsi-clst1-sync: -INFINITY

native_color: ipmi-fence-clst1 allocation score on ventsi-clst2-sync: INFINITY

native_color: ipmi-fence-clst2 allocation score on ventsi-clst1-sync: INFINITY

native_color: ipmi-fence-clst2 allocation score on ventsi-clst2-sync: -INFINITY

clone_color: DRBDClone allocation score on ventsi-clst1-sync: 0

clone_color: DRBDClone allocation score on ventsi-clst2-sync: 0

clone_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY

clone_color: DRBD:0 allocation score on ventsi-clst2-sync: 0

clone_color: DRBD:1 allocation score on ventsi-clst1-sync: 0

clone_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY

native_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY

native_color: DRBD:0 allocation score on ventsi-clst2-sync: 0

native_color: DRBD:1 allocation score on ventsi-clst1-sync: -INFINITY

native_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY

DRBD:1 promotion score on ventsi-clst2-sync: 10000

DRBD:0 promotion score on ventsi-clst1-sync: 1

native_color: DRBD_global_clst allocation score on ventsi-clst1-sync: -INFINITY

native_color: DRBD_global_clst allocation score on ventsi-clst2-sync: INFINITY

native_color: IPaddrNFS allocation score on ventsi-clst1-sync: -INFINITY

native_color: IPaddrNFS allocation score on ventsi-clst2-sync: 0

native_color: NFSServer allocation score on ventsi-clst1-sync: -INFINITY

native_color: NFSServer allocation score on ventsi-clst2-sync: 0

native_color: NFS_global_clst allocation score on ventsi-clst1-sync: 0

native_color: NFS_global_clst allocation score on ventsi-clst2-sync: -INFINITY

native_color: BIND_global_clst allocation score on ventsi-clst1-sync: -INFINITY

native_color: BIND_global_clst allocation score on ventsi-clst2-sync: 0

Transition Summary:

* Start   IPaddrNFS    (ventsi-clst2-sync)

* Start   NFSServer    (ventsi-clst2-sync)

* Demote  DRBD:0       (Master -> Slave ventsi-clst1-sync)    <=== this demote never happens

* Promote DRBD:1       (Slave -> Master ventsi-clst2-sync)

* Start   DRBD_global_clst     (ventsi-clst2-sync)

* Start   NFS_global_clst      (ventsi-clst1-sync)

* Start   BIND_global_clst     (ventsi-clst2-sync)

And this is the executed transaction:
==================================================================

[root at ventsi-clst2 ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-1157.bz2 --save-graph problem5.graph --save-dotfile problem5.dot -V --simulate

Using the original execution date of: 2016-11-09 17:54:10Z

Current cluster status:

Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync

ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync

IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync

NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync

Master/Slave Set: DRBDClone [DRBD]

     Masters: [ ventsi-clst1-sync ]

     Slaves: [ ventsi-clst2-sync ]

DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started ventsi-clst1-sync

NFS_global_clst        (ocf::heartbeat:Filesystem):    Started ventsi-clst2-sync

BIND_global_clst       (ocf::heartbeat:Filesystem):    Started ventsi-clst1-sync

Transition Summary:

* Stop    IPaddrNFS    (ventsi-clst1-sync)

* Stop    NFSServer    (ventsi-clst1-sync)

* Stop    DRBD_global_clst     (ventsi-clst1-sync)

* Stop    NFS_global_clst      (Started ventsi-clst2-sync)

* Stop    BIND_global_clst     (ventsi-clst1-sync)

Executing cluster transition:

* Resource action: NFS_global_clst stop on ventsi-clst2-sync

* Resource action: BIND_global_clst stop on ventsi-clst1-sync

* Resource action: NFSServer       stop on ventsi-clst1-sync

* Resource action: IPaddrNFS       stop on ventsi-clst1-sync

* Resource action: DRBD_global_clst stop on ventsi-clst1-sync

* Pseudo action:   all_stopped    <=== no demote

Using the original execution date of: 2016-11-09 17:54:10Z

Revised cluster status:

Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync

ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync

IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped

NFSServer      (ocf::heartbeat:nfsserver):     Stopped

Master/Slave Set: DRBDClone [DRBD]

     Masters: [ ventsi-clst1-sync ]

     Slaves: [ ventsi-clst2-sync ]

DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped

NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped

BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped

And finally here the updated config:
==================================================================

[root at ventsi-clst1 ~]# pcs config

Cluster Name: clst1

Corosync Nodes:

ventsi-clst1-sync ventsi-clst2-sync

Pacemaker Nodes:

ventsi-clst1-sync ventsi-clst2-sync

Resources:

Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)

  Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24

  Operations: start interval=0 timeout=20 (IPaddrNFS-start-interval-0)

              stop interval=0 timeout=20 (IPaddrNFS-stop-interval-0)

              monitor interval=10 timeout=20 (IPaddrNFS-monitor-interval-10)

Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)

  Attributes: nfs_shared_infodir=/drbdmnts/global_clst/nfsserversettings/ nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"

  Operations: start interval=0 timeout=40 (NFSServer-start-interval-0)

              stop interval=0 timeout=20 (NFSServer-stop-interval-0)

              monitor interval=10 timeout=20 (NFSServer-monitor-interval-10)

Master: DRBDClone

  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

  Resource: DRBD (class=ocf provider=linbit type=drbd)

   Attributes: drbd_resource=nfsdata

   Operations: start interval=0 timeout=240 (DRBD-start-interval-0)

               promote interval=0 timeout=90 (DRBD-promote-interval-0)

               demote interval=0 timeout=90 (DRBD-demote-interval-0)

               stop interval=0 timeout=100 (DRBD-stop-interval-0)

               monitor interval=9 role=Master timeout=5 (DRBD-monitor-interval-9)

               monitor interval=10 role=Slave timeout=5 (DRBD-monitor-interval-10)

Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)

  Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4

  Operations: start interval=0 timeout=60 (DRBD_global_clst-start-interval-0)

              stop interval=0 timeout=60 (DRBD_global_clst-stop-interval-0)

              monitor interval=20 timeout=40 (DRBD_global_clst-monitor-interval-20)

Resource: NFS_global_clst (class=ocf provider=heartbeat type=Filesystem)

  Attributes: device=xxx.xxx.xxx.xxx:/drbdmnts/global_clst/nfs directory=/global/nfs fstype=nfs

  Operations: start interval=0 timeout=60 (NFS_global_clst-start-interval-0)

              stop interval=0 timeout=60 (NFS_global_clst-stop-interval-0)

              monitor interval=20 timeout=40 (NFS_global_clst-monitor-interval-20)

Resource: BIND_global_clst (class=ocf provider=heartbeat type=Filesystem)

  Attributes: device=/drbdmnts/global_clst/nfs directory=/global/nfs fstype=none options=bind

  Operations: start interval=0 timeout=60 (BIND_global_clst-start-interval-0)

              stop interval=0 timeout=60 (BIND_global_clst-stop-interval-0)

              monitor interval=20 timeout=40 (BIND_global_clst-monitor-interval-20)

Stonith Devices:

Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)

  Attributes: lanplus=1 login=foo passwd=bar action=reboot ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1

  Operations: monitor interval=60 (ipmi-fence-clst1-monitor-interval-60)

Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)

  Attributes: lanplus=1 login=foo passwd=bar action=reboot ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1

  Operations: monitor interval=60 (ipmi-fence-clst2-monitor-interval-60)

Fencing Levels:

Location Constraints:

  Resource: DRBD_global_clst

    Disabled on: ventsi-clst1-sync (score:-INFINITY) (role: Started) (id:cli-ban-DRBD_global_clst-on-ventsi-clst1-sync)

  Resource: ipmi-fence-clst1

    Disabled on: ventsi-clst1-sync (score:-INFINITY) (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)

  Resource: ipmi-fence-clst2

    Disabled on: ventsi-clst2-sync (score:-INFINITY) (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)

Ordering Constraints:

  start IPaddrNFS then start NFSServer (kind:Mandatory) (id:order-IPaddrNFS-NFSServer-mandatory)

  promote DRBDClone then start DRBD_global_clst (kind:Mandatory) (id:order-DRBDClone-DRBD_global_clst-mandatory)

  start DRBD_global_clst then start IPaddrNFS (kind:Mandatory) (id:order-DRBD_global_clst-IPaddrNFS-mandatory)

  start NFSServer then start NFS_global_clst (kind:Mandatory) (id:order-NFSServer-NFS_global_clst-mandatory)

  start NFSServer then start BIND_global_clst (kind:Mandatory) (id:order-NFSServer-BIND_global_clst-mandatory)

Colocation Constraints:

  NFSServer with IPaddrNFS (score:INFINITY) (id:colocation-NFSServer-IPaddrNFS-INFINITY)

  IPaddrNFS with DRBD_global_clst (score:INFINITY) (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)

  NFS_global_clst with NFSServer (score:-INFINITY) (id:colocation-NFS_global_clst-NFSServer--INFINITY)

  BIND_global_clst with NFSServer (score:INFINITY) (id:colocation-BIND_global_clst-NFSServer-INFINITY)

  DRBD_global_clst with DRBDClone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)

Resources Defaults:

resource-stickiness: INFINITY

Operations Defaults:

timeout: 10s

Cluster Properties:

cluster-infrastructure: cman

dc-version: 1.1.14-8.el6-70404b0

have-watchdog: false

last-lrm-refresh: 1478703150

no-quorum-policy: ignore

stonith-enabled: true

symmetric-cluster: true

Node Attributes:

ventsi-clst1-sync: PostgresSon-data-status=DISCONNECT

ventsi-clst2-sync: PostgresSon-data-status=DISCONNECT

Kind regards

Andi

-----Original Message-----
From: Ken Gaillot [mailto:kgaillot at redhat.com]
Sent: Dienstag, 8. November 2016 22:29
To: users at clusterlabs.org
Subject: Re: [ClusterLabs] DRBD demote/promote not called - Why? How to fix?

On 11/04/2016 01:57 PM, CART Andreas wrote:

> Hi

>

> I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 ,

> pcs: 0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.

> This cluster runs NFS on top of DRBD (8.4.4).

>

> Basically the system is working on both nodes and I can switch the

> resources from one node to the other.

> But switching resources to the other node does not work, if I try to

> move just one resource and have the others follow due to the location

> constraints.

>

> From the logged messages I see that in this "failure case" there is NO

> attempt to demote/promote the DRBD clone resource.

>

> Here is my setup:

> ==================================================================

> Cluster Name: clst1

> Corosync Nodes:

> ventsi-clst1-sync ventsi-clst2-sync

> Pacemaker Nodes:

> ventsi-clst1-sync ventsi-clst2-sync

>

> Resources:

> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)

>   Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24

>   Operations: start interval=0s timeout=20s (IPaddrNFS-start-interval-0s)

>               stop interval=0s timeout=20s (IPaddrNFS-stop-interval-0s)

>               monitor interval=5s (IPaddrNFS-monitor-interval-5s)

> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)

>   Attributes: nfs_shared_infodir=/var/lib/nfsserversettings/

> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"

>   Operations: start interval=0s timeout=40 (NFSServer-start-interval-0s)

>               stop interval=0s timeout=20s (NFSServer-stop-interval-0s)

>               monitor interval=10s timeout=20s

> (NFSServer-monitor-interval-10s)

> Master: DRBDClone

>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2

> clone-node-max=1 notify=true

>   Resource: DRBD (class=ocf provider=linbit type=drbd)

>    Attributes: drbd_resource=nfsdata

>    Operations: start interval=0s timeout=240 (DRBD-start-interval-0s)

>                promote interval=0s timeout=90 (DRBD-promote-interval-0s)

>                demote interval=0s timeout=90 (DRBD-demote-interval-0s)

>                stop interval=0s timeout=100 (DRBD-stop-interval-0s)

>                monitor interval=1s timeout=5 (DRBD-monitor-interval-1s)

> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)

>   Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4

>   Operations: start interval=0s timeout=60

> (DRBD_global_clst-start-interval-0s)

>               stop interval=0s timeout=60

> (DRBD_global_clst-stop-interval-0s)

>               monitor interval=20 timeout=40

> (DRBD_global_clst-monitor-interval-20)

>

> Stonith Devices:

> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)

>   Attributes: lanplus=1 login=foo passwd=bar action=reboot

> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list

> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1

>   Operations: monitor interval=60s (ipmi-fence-clst1-monitor-interval-60s)

> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)

>   Attributes: lanplus=1 login=foo passwd=bar action=reboot

> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list

> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1

>   Operations: monitor interval=60s (ipmi-fence-clst2-monitor-interval-60s)

> Fencing Levels:

>

> Location Constraints:

>   Resource: ipmi-fence-clst1

>     Disabled on: ventsi-clst1-sync (score:-INFINITY)

> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)

>   Resource: ipmi-fence-clst2

>     Disabled on: ventsi-clst2-sync (score:-INFINITY)

> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)

> Ordering Constraints:

>   start IPaddrNFS then start NFSServer (kind:Mandatory)

> (id:order-IPaddrNFS-NFSServer-mandatory)

>   promote DRBDClone then start DRBD_global_clst (kind:Mandatory)

> (id:order-DRBDClone-DRBD_global_clst-mandatory)

>   start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)

> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)

> Colocation Constraints:

>   NFSServer with IPaddrNFS (score:INFINITY)

> (id:colocation-NFSServer-IPaddrNFS-INFINITY)

>   DRBD_global_clst with DRBDClone (score:INFINITY)

> (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)

It took me a while to notice it, it's easily overlooked, but the above

constraint is the problem. It says DRBD_global_clst must be located

where DRBDClone is running ... not necessarily where DRBDClone is

master. This constraint should be created like this:

pcs constraint colocation add DRBD_global_clst with master DBRDClone

>   IPaddrNFS with DRBD_global_clst (score:INFINITY)

> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)

>

> Resources Defaults:

> resource-stickiness: INFINITY

> Operations Defaults:

> timeout: 10s

>

> Cluster Properties:

> cluster-infrastructure: cman

> dc-version: 1.1.14-8.el6-70404b0

> have-watchdog: false

> last-lrm-refresh: 1478277432

> no-quorum-policy: ignore

> stonith-enabled: true

> symmetric-cluster: true

> ==================================================================

>

> Initial state is e.g. this (all resources at node1):

>

> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

>

> Full list of resources:

>

> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started

> ventsi-clst2-sync

> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started

> ventsi-clst1-sync

> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync

> NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync

> Master/Slave Set: DRBDClone [DRBD]

>      Masters: [ ventsi-clst1-sync ]

>      Slaves: [ ventsi-clst2-sync ]

> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started

> ventsi-clst1-sync

> ==================================================================

>

> If I shutdown the cluster at node 1 ('pcs cluster stop') or if I move

> the DRBD clone resource ('pcs resource move DRBDClone') all resources

> switch successfully to node2.

> I.e. the demote/promote of the DRBD clone resource is working in these

> cases.

>

> But if I try to move any other resource (e.g. 'pcs resource move

> NFSServer') the resources NFSServer, IPaddrNFS and DRBD_global_clst are

> stopped at node 1, but then already follows starting of the

> DRBD_global_clst resource at node2, which fails due to the missing

> demote/promote.

> As far as I can see there is some follow-up attempt to repair things

> partially as the resources are started again at node1 exclusive the

> resource which I moved due to my move command.

>

> Final state is like this:

>

> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

>

> Full list of resources:

>

> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started

> ventsi-clst2-sync

> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started

> ventsi-clst1-sync

> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync

> NFSServer      (ocf::heartbeat:nfsserver):     Stopped

> Master/Slave Set: DRBDClone [DRBD]

>      Masters: [ ventsi-clst1-sync ]

>      Slaves: [ ventsi-clst2-sync ]

> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started

> ventsi-clst1-sync

>

> Failed Actions:

> * DRBD_global_clst_start_0 on ventsi-clst2-sync 'unknown error' (1):

> call=778, status=complete, exitreason='none',

>     last-rc-change='Fri Nov  4 19:32:56 2016', queued=0ms, exec=43ms

> ==================================================================

>

> Here are the logged messages for this "failure case":

>

> 2016-11-04T19:32:55.163982+01:00 ventsi-clst1 crmd[6116]:   notice:

> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC

> cause=C_FSA_INTERNAL origin=abort_transition_graph ]

> 2016-11-04T19:32:55.168100+01:00 ventsi-clst1 pengine[6115]:   notice:

> On loss of CCM Quorum: Ignore

> 2016-11-04T19:32:55.181252+01:00 ventsi-clst1 pengine[6115]:   notice:

> Move    IPaddrNFS#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)

> 2016-11-04T19:32:55.181260+01:00 ventsi-clst1 pengine[6115]:   notice:

> Move    NFSServer#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)

> 2016-11-04T19:32:55.181278+01:00 ventsi-clst1 pengine[6115]:   notice:

> Move    DRBD_global_clst#011(Started ventsi-clst1-sync ->

> ventsi-clst2-sync)  <=== here no demote/promote is listed

> 2016-11-04T19:32:55.182385+01:00 ventsi-clst1 pengine[6115]:   notice:

> Calculated Transition 202: /var/lib/pacemaker/pengine/pe-input-766.bz2

> 2016-11-04T19:32:55.182998+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 15: stop NFSServer_stop_0 on ventsi-clst1-sync (local)

> 2016-11-04T19:32:55.196265+01:00 ventsi-clst1

> nfsserver(NFSServer)[15978]: INFO: Stopping NFS server ...

> 2016-11-04T19:32:55.249137+01:00 ventsi-clst1 kernel: nfsd: last server

> has exited, flushing export cache

> 2016-11-04T19:32:55.252241+01:00 ventsi-clst1 rpc.mountd[15282]: Caught

> signal 15, un-registering and exiting.

> 2016-11-04T19:32:55.632708+01:00 ventsi-clst1

> nfsserver(NFSServer)[15978]: INFO: Stopping sm-notify

> 2016-11-04T19:32:55.650552+01:00 ventsi-clst1

> nfsserver(NFSServer)[15978]: INFO: Stopping rpc.statd

> 2016-11-04T19:32:55.666777+01:00 ventsi-clst1 rpc.statd[15243]: Caught

> signal 15, un-registering and exiting

> 2016-11-04T19:32:56.692819+01:00 ventsi-clst1

> nfsserver(NFSServer)[15978]: INFO: NFS server stopped

> 2016-11-04T19:32:56.695523+01:00 ventsi-clst1 crmd[6116]:   notice:

> Operation NFSServer_stop_0: ok (node=ventsi-clst1-sync, call=1220, rc=0,

> cib-update=1695, confirmed=true)

> 2016-11-04T19:32:56.696243+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 12: stop IPaddrNFS_stop_0 on ventsi-clst1-sync (local)

> 2016-11-04T19:32:56.727882+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16108]:

> INFO: IP status = ok, IP_CIP=

> 2016-11-04T19:32:56.733383+01:00 ventsi-clst1 crmd[6116]:   notice:

> Operation IPaddrNFS_stop_0: ok (node=ventsi-clst1-sync, call=1222, rc=0,

> cib-update=1696, confirmed=true)

> 2016-11-04T19:32:56.733917+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 48: stop DRBD_global_clst_stop_0 on ventsi-clst1-sync

> (local)

> 2016-11-04T19:32:56.757181+01:00 ventsi-clst1

> Filesystem(DRBD_global_clst)[16163]: INFO: Running stop for /dev/drbd1

> on /drbdmnts/global_clst

> 2016-11-04T19:32:56.764684+01:00 ventsi-clst1

> Filesystem(DRBD_global_clst)[16163]: INFO: Trying to unmount

> /drbdmnts/global_clst

> 2016-11-04T19:32:56.771260+01:00 ventsi-clst1

> Filesystem(DRBD_global_clst)[16163]: INFO: unmounted

> /drbdmnts/global_clst successfully

> 2016-11-04T19:32:56.776640+01:00 ventsi-clst1 crmd[6116]:   notice:

> Operation DRBD_global_clst_stop_0: ok (node=ventsi-clst1-sync,

> call=1224, rc=0, cib-update=1697, confirmed=true)

> 2016-11-04T19:32:56.777140+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 49: start DRBD_global_clst_start_0 on

> ventsi-clst2-sync   <=== hereis the attempt to start the filesystem at

> the other node, although DRBD has not yet been promoted

> 2016-11-04T19:32:56.840137+01:00 ventsi-clst1 crmd[6116]:  warning:

> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed

> (target: 0 vs. rc: 1): Error

> 2016-11-04T19:32:56.840158+01:00 ventsi-clst1 crmd[6116]:   notice:

> Transition aborted by DRBD_global_clst_start_0 'modify' on

> ventsi-clst2-sync: Event failed

> (magic=0:1;49:202:0:b7941532-c74b-40cc-a8ad-27b5502b8fba, cib=0.649.4,

> source=match_graph_event:381, 0)

> 2016-11-04T19:32:56.840232+01:00 ventsi-clst1 crmd[6116]:  warning:

> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed

> (target: 0 vs. rc: 1): Error

> 2016-11-04T19:32:56.840328+01:00 ventsi-clst1 crmd[6116]:   notice:

> Transition 202 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=5,

> Source=/var/lib/pacemaker/pengine/pe-input-766.bz2): Complete

> 2016-11-04T19:32:56.843693+01:00 ventsi-clst1 pengine[6115]:   notice:

> On loss of CCM Quorum: Ignore

> 2016-11-04T19:32:56.844072+01:00 ventsi-clst1 pengine[6115]:  warning:

> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:

> unknown error (1)

> 2016-11-04T19:32:56.844102+01:00 ventsi-clst1 pengine[6115]:  warning:

> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:

> unknown error (1)

> 2016-11-04T19:32:56.845071+01:00 ventsi-clst1 pengine[6115]:   notice:

> Start   IPaddrNFS#011(ventsi-clst2-sync)

> 2016-11-04T19:32:56.845078+01:00 ventsi-clst1 pengine[6115]:   notice:

> Start   NFSServer#011(ventsi-clst2-sync)

> 2016-11-04T19:32:56.845081+01:00 ventsi-clst1 pengine[6115]:   notice:

> Demote  DRBD:0#011(Master -> Slave ventsi-clst1-sync)   <=== here there

> would be the necessarydemote/promote ... but it's too late; the start of

> the filesystem already failed...

> 2016-11-04T19:32:56.845083+01:00 ventsi-clst1 pengine[6115]:   notice:

> Promote DRBD:1#011(Slave -> Master ventsi-clst2-sync)

> 2016-11-04T19:32:56.845084+01:00 ventsi-clst1 pengine[6115]:   notice:

> Recover DRBD_global_clst#011(Started ventsi-clst2-sync)

> 2016-11-04T19:32:56.847986+01:00 ventsi-clst1 pengine[6115]:   notice:

> Calculated Transition 203: /var/lib/pacemaker/pengine/pe-input-767.bz2

> <=== ... so the above transition gets caught by thefollowing attempt to

> repair things partially

> 2016-11-04T19:32:56.867679+01:00 ventsi-clst1 pengine[6115]:   notice:

> On loss of CCM Quorum: Ignore

> 2016-11-04T19:32:56.868074+01:00 ventsi-clst1 pengine[6115]:  warning:

> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:

> unknown error (1)

> 2016-11-04T19:32:56.868101+01:00 ventsi-clst1 pengine[6115]:  warning:

> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:

> unknown error (1)

> 2016-11-04T19:32:56.868287+01:00 ventsi-clst1 pengine[6115]:  warning:

> Forcing DRBD_global_clst away from ventsi-clst2-sync after 1000000

> failures (max=1000000)

> 2016-11-04T19:32:56.869011+01:00 ventsi-clst1 pengine[6115]:   notice:

> Start   IPaddrNFS#011(ventsi-clst1-sync)

> 2016-11-04T19:32:56.869023+01:00 ventsi-clst1 pengine[6115]:   notice:

> Recover DRBD_global_clst#011(Started ventsi-clst2-sync -> ventsi-clst1-sync)

> 2016-11-04T19:32:56.869770+01:00 ventsi-clst1 pengine[6115]:   notice:

> Calculated Transition 204: /var/lib/pacemaker/pengine/pe-input-768.bz2

> 2016-11-04T19:32:56.870065+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 3: stop DRBD_global_clst_stop_0 on ventsi-clst2-sync

> 2016-11-04T19:32:56.908075+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 42: start DRBD_global_clst_start_0 on

> ventsi-clst1-sync (local)

> 2016-11-04T19:32:56.931072+01:00 ventsi-clst1

> Filesystem(DRBD_global_clst)[16242]: INFO: Running start for /dev/drbd1

> on /drbdmnts/global_clst

> 2016-11-04T19:32:56.943250+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):

> warning: maximal mount count reached, running e2fsck is recommended

> 2016-11-04T19:32:56.953253+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):

> mounted filesystem with ordered data mode. Opts:

> 2016-11-04T19:32:56.964284+01:00 ventsi-clst1 crmd[6116]:   notice:

> Operation DRBD_global_clst_start_0: ok (node=ventsi-clst1-sync,

> call=1225, rc=0, cib-update=1701, confirmed=true)

> 2016-11-04T19:32:56.965104+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 10: start IPaddrNFS_start_0 on ventsi-clst1-sync (local)

> 2016-11-04T19:32:56.965325+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 43: monitor DRBD_global_clst_monitor_20000 on

> ventsi-clst1-sync (local)

> 2016-11-04T19:32:56.996235+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:

> INFO: Adding inet address xxx.xxx.xxx.xxx/24 with broadcast address

> xxx.xxx.xxx.255 to device bond0

> 2016-11-04T19:32:57.002059+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:

> INFO: Bringing device bond0 up

> 2016-11-04T19:32:57.008128+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:

> INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p

> /var/run/resource-agents/send_arp-xxx.xxx.xxx.xxx bond0 xxx.xxx.xxx.xxx

> auto not_used not_used

> 2016-11-04T19:32:57.020159+01:00 ventsi-clst1 crmd[6116]:   notice:

> Operation IPaddrNFS_start_0: ok (node=ventsi-clst1-sync, call=1226,

> rc=0, cib-update=1703, confirmed=true)

> 2016-11-04T19:32:57.020901+01:00 ventsi-clst1 crmd[6116]:   notice:

> Initiating action 11: monitor IPaddrNFS_monitor_5000 on

> ventsi-clst1-sync (local)

> 2016-11-04T19:32:57.052231+01:00 ventsi-clst1 crmd[6116]:   notice:

> Transition 204 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,

> Source=/var/lib/pacemaker/pengine/pe-input-768.bz2): Complete

> 2016-11-04T19:32:57.052251+01:00 ventsi-clst1 crmd[6116]:   notice:

> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS

> cause=C_FSA_INTERNAL origin=notify_crmd ]

> ==================================================================

>

> Any ideas what could be the reason for this behavior?

> And how could this be fixed?

>

>

> (I already found several articles on the internet with the

> recommendation to have two separately configured monitor operations for

> the DRBD resource configured one for the master role and another one for

> the slave role.

> Already tried this to no avail.)

>

> Regards

> Andi

_______________________________________________

Users mailing list: Users at clusterlabs.org

http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org

Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161109/b32404f4/attachment.htm>