[ClusterLabs] DRBD demote/promote not called - Why? How to fix?
    CART Andreas 
    andreas.cart at sonorys.at
       
    Wed Nov  9 18:27:39 UTC 2016
    
    
  
Hi again
Sorry for missing the omission of the master role within the colocation constraint.
I  added it  - but unfortunately still no success.
(In the meantime I added 2 additional filesystem resources on top of the NFSServer, but that should not change anything regarding the root problem that I miss the demote of DRBDClone.)
I again started with all resources located at ventsi-clst1 and issued a 'pcs resource move DRBD_global_clst' (the resource next collocated next to the DRBDClone).
With that I end up with all primitive resources stopped and the DRBDClone resource still being master at ventsi-clst1.
Here is what pacemaker pretends has to be done:
==================================================================
[root at ventsi-clst2 ~]# crm_simulate -Ls
Current cluster status:
Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync
ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync
IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped
NFSServer      (ocf::heartbeat:nfsserver):     Stopped
Master/Slave Set: DRBDClone [DRBD]
     Masters: [ ventsi-clst1-sync ]    <=== still not demoted
     Slaves: [ ventsi-clst2-sync ]
DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped
NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped
BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped
Allocation scores:
native_color: ipmi-fence-clst1 allocation score on ventsi-clst1-sync: -INFINITY
native_color: ipmi-fence-clst1 allocation score on ventsi-clst2-sync: INFINITY
native_color: ipmi-fence-clst2 allocation score on ventsi-clst1-sync: INFINITY
native_color: ipmi-fence-clst2 allocation score on ventsi-clst2-sync: -INFINITY
clone_color: DRBDClone allocation score on ventsi-clst1-sync: 0
clone_color: DRBDClone allocation score on ventsi-clst2-sync: 0
clone_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
clone_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
clone_color: DRBD:1 allocation score on ventsi-clst1-sync: 0
clone_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
native_color: DRBD:0 allocation score on ventsi-clst1-sync: INFINITY
native_color: DRBD:0 allocation score on ventsi-clst2-sync: 0
native_color: DRBD:1 allocation score on ventsi-clst1-sync: -INFINITY
native_color: DRBD:1 allocation score on ventsi-clst2-sync: INFINITY
DRBD:1 promotion score on ventsi-clst2-sync: 10000
DRBD:0 promotion score on ventsi-clst1-sync: 1
native_color: DRBD_global_clst allocation score on ventsi-clst1-sync: -INFINITY
native_color: DRBD_global_clst allocation score on ventsi-clst2-sync: INFINITY
native_color: IPaddrNFS allocation score on ventsi-clst1-sync: -INFINITY
native_color: IPaddrNFS allocation score on ventsi-clst2-sync: 0
native_color: NFSServer allocation score on ventsi-clst1-sync: -INFINITY
native_color: NFSServer allocation score on ventsi-clst2-sync: 0
native_color: NFS_global_clst allocation score on ventsi-clst1-sync: 0
native_color: NFS_global_clst allocation score on ventsi-clst2-sync: -INFINITY
native_color: BIND_global_clst allocation score on ventsi-clst1-sync: -INFINITY
native_color: BIND_global_clst allocation score on ventsi-clst2-sync: 0
Transition Summary:
* Start   IPaddrNFS    (ventsi-clst2-sync)
* Start   NFSServer    (ventsi-clst2-sync)
* Demote  DRBD:0       (Master -> Slave ventsi-clst1-sync)    <=== this demote never happens
* Promote DRBD:1       (Slave -> Master ventsi-clst2-sync)
* Start   DRBD_global_clst     (ventsi-clst2-sync)
* Start   NFS_global_clst      (ventsi-clst1-sync)
* Start   BIND_global_clst     (ventsi-clst2-sync)
And this is the executed transaction:
==================================================================
[root at ventsi-clst2 ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-1157.bz2 --save-graph problem5.graph --save-dotfile problem5.dot -V --simulate
Using the original execution date of: 2016-11-09 17:54:10Z
Current cluster status:
Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync
ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync
IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync
Master/Slave Set: DRBDClone [DRBD]
     Masters: [ ventsi-clst1-sync ]
     Slaves: [ ventsi-clst2-sync ]
DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started ventsi-clst1-sync
NFS_global_clst        (ocf::heartbeat:Filesystem):    Started ventsi-clst2-sync
BIND_global_clst       (ocf::heartbeat:Filesystem):    Started ventsi-clst1-sync
Transition Summary:
* Stop    IPaddrNFS    (ventsi-clst1-sync)
* Stop    NFSServer    (ventsi-clst1-sync)
* Stop    DRBD_global_clst     (ventsi-clst1-sync)
* Stop    NFS_global_clst      (Started ventsi-clst2-sync)
* Stop    BIND_global_clst     (ventsi-clst1-sync)
Executing cluster transition:
* Resource action: NFS_global_clst stop on ventsi-clst2-sync
* Resource action: BIND_global_clst stop on ventsi-clst1-sync
* Resource action: NFSServer       stop on ventsi-clst1-sync
* Resource action: IPaddrNFS       stop on ventsi-clst1-sync
* Resource action: DRBD_global_clst stop on ventsi-clst1-sync
* Pseudo action:   all_stopped    <=== no demote
Using the original execution date of: 2016-11-09 17:54:10Z
Revised cluster status:
Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
ipmi-fence-clst1       (stonith:fence_ipmilan):        Started ventsi-clst2-sync
ipmi-fence-clst2       (stonith:fence_ipmilan):        Started ventsi-clst1-sync
IPaddrNFS      (ocf::heartbeat:IPaddr2):       Stopped
NFSServer      (ocf::heartbeat:nfsserver):     Stopped
Master/Slave Set: DRBDClone [DRBD]
     Masters: [ ventsi-clst1-sync ]
     Slaves: [ ventsi-clst2-sync ]
DRBD_global_clst       (ocf::heartbeat:Filesystem):    Stopped
NFS_global_clst        (ocf::heartbeat:Filesystem):    Stopped
BIND_global_clst       (ocf::heartbeat:Filesystem):    Stopped
And finally here the updated config:
==================================================================
[root at ventsi-clst1 ~]# pcs config
Cluster Name: clst1
Corosync Nodes:
ventsi-clst1-sync ventsi-clst2-sync
Pacemaker Nodes:
ventsi-clst1-sync ventsi-clst2-sync
Resources:
Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
  Operations: start interval=0 timeout=20 (IPaddrNFS-start-interval-0)
              stop interval=0 timeout=20 (IPaddrNFS-stop-interval-0)
              monitor interval=10 timeout=20 (IPaddrNFS-monitor-interval-10)
Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
  Attributes: nfs_shared_infodir=/drbdmnts/global_clst/nfsserversettings/ nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
  Operations: start interval=0 timeout=40 (NFSServer-start-interval-0)
              stop interval=0 timeout=20 (NFSServer-stop-interval-0)
              monitor interval=10 timeout=20 (NFSServer-monitor-interval-10)
Master: DRBDClone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: DRBD (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=nfsdata
   Operations: start interval=0 timeout=240 (DRBD-start-interval-0)
               promote interval=0 timeout=90 (DRBD-promote-interval-0)
               demote interval=0 timeout=90 (DRBD-demote-interval-0)
               stop interval=0 timeout=100 (DRBD-stop-interval-0)
               monitor interval=9 role=Master timeout=5 (DRBD-monitor-interval-9)
               monitor interval=10 role=Slave timeout=5 (DRBD-monitor-interval-10)
Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4
  Operations: start interval=0 timeout=60 (DRBD_global_clst-start-interval-0)
              stop interval=0 timeout=60 (DRBD_global_clst-stop-interval-0)
              monitor interval=20 timeout=40 (DRBD_global_clst-monitor-interval-20)
Resource: NFS_global_clst (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=xxx.xxx.xxx.xxx:/drbdmnts/global_clst/nfs directory=/global/nfs fstype=nfs
  Operations: start interval=0 timeout=60 (NFS_global_clst-start-interval-0)
              stop interval=0 timeout=60 (NFS_global_clst-stop-interval-0)
              monitor interval=20 timeout=40 (NFS_global_clst-monitor-interval-20)
Resource: BIND_global_clst (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/drbdmnts/global_clst/nfs directory=/global/nfs fstype=none options=bind
  Operations: start interval=0 timeout=60 (BIND_global_clst-start-interval-0)
              stop interval=0 timeout=60 (BIND_global_clst-stop-interval-0)
              monitor interval=20 timeout=40 (BIND_global_clst-monitor-interval-20)
Stonith Devices:
Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
  Attributes: lanplus=1 login=foo passwd=bar action=reboot ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
  Operations: monitor interval=60 (ipmi-fence-clst1-monitor-interval-60)
Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
  Attributes: lanplus=1 login=foo passwd=bar action=reboot ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
  Operations: monitor interval=60 (ipmi-fence-clst2-monitor-interval-60)
Fencing Levels:
Location Constraints:
  Resource: DRBD_global_clst
    Disabled on: ventsi-clst1-sync (score:-INFINITY) (role: Started) (id:cli-ban-DRBD_global_clst-on-ventsi-clst1-sync)
  Resource: ipmi-fence-clst1
    Disabled on: ventsi-clst1-sync (score:-INFINITY) (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
  Resource: ipmi-fence-clst2
    Disabled on: ventsi-clst2-sync (score:-INFINITY) (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
Ordering Constraints:
  start IPaddrNFS then start NFSServer (kind:Mandatory) (id:order-IPaddrNFS-NFSServer-mandatory)
  promote DRBDClone then start DRBD_global_clst (kind:Mandatory) (id:order-DRBDClone-DRBD_global_clst-mandatory)
  start DRBD_global_clst then start IPaddrNFS (kind:Mandatory) (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
  start NFSServer then start NFS_global_clst (kind:Mandatory) (id:order-NFSServer-NFS_global_clst-mandatory)
  start NFSServer then start BIND_global_clst (kind:Mandatory) (id:order-NFSServer-BIND_global_clst-mandatory)
Colocation Constraints:
  NFSServer with IPaddrNFS (score:INFINITY) (id:colocation-NFSServer-IPaddrNFS-INFINITY)
  IPaddrNFS with DRBD_global_clst (score:INFINITY) (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
  NFS_global_clst with NFSServer (score:-INFINITY) (id:colocation-NFS_global_clst-NFSServer--INFINITY)
  BIND_global_clst with NFSServer (score:INFINITY) (id:colocation-BIND_global_clst-NFSServer-INFINITY)
  DRBD_global_clst with DRBDClone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
Resources Defaults:
resource-stickiness: INFINITY
Operations Defaults:
timeout: 10s
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.14-8.el6-70404b0
have-watchdog: false
last-lrm-refresh: 1478703150
no-quorum-policy: ignore
stonith-enabled: true
symmetric-cluster: true
Node Attributes:
ventsi-clst1-sync: PostgresSon-data-status=DISCONNECT
ventsi-clst2-sync: PostgresSon-data-status=DISCONNECT
Kind regards
Andi
-----Original Message-----
From: Ken Gaillot [mailto:kgaillot at redhat.com]
Sent: Dienstag, 8. November 2016 22:29
To: users at clusterlabs.org
Subject: Re: [ClusterLabs] DRBD demote/promote not called - Why? How to fix?
On 11/04/2016 01:57 PM, CART Andreas wrote:
> Hi
>
> I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 ,
> pcs: 0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.
> This cluster runs NFS on top of DRBD (8.4.4).
>
> Basically the system is working on both nodes and I can switch the
> resources from one node to the other.
> But switching resources to the other node does not work, if I try to
> move just one resource and have the others follow due to the location
> constraints.
>
> From the logged messages I see that in this "failure case" there is NO
> attempt to demote/promote the DRBD clone resource.
>
> Here is my setup:
> ==================================================================
> Cluster Name: clst1
> Corosync Nodes:
> ventsi-clst1-sync ventsi-clst2-sync
> Pacemaker Nodes:
> ventsi-clst1-sync ventsi-clst2-sync
>
> Resources:
> Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
>   Operations: start interval=0s timeout=20s (IPaddrNFS-start-interval-0s)
>               stop interval=0s timeout=20s (IPaddrNFS-stop-interval-0s)
>               monitor interval=5s (IPaddrNFS-monitor-interval-5s)
> Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
>   Attributes: nfs_shared_infodir=/var/lib/nfsserversettings/
> nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
>   Operations: start interval=0s timeout=40 (NFSServer-start-interval-0s)
>               stop interval=0s timeout=20s (NFSServer-stop-interval-0s)
>               monitor interval=10s timeout=20s
> (NFSServer-monitor-interval-10s)
> Master: DRBDClone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: DRBD (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=nfsdata
>    Operations: start interval=0s timeout=240 (DRBD-start-interval-0s)
>                promote interval=0s timeout=90 (DRBD-promote-interval-0s)
>                demote interval=0s timeout=90 (DRBD-demote-interval-0s)
>                stop interval=0s timeout=100 (DRBD-stop-interval-0s)
>                monitor interval=1s timeout=5 (DRBD-monitor-interval-1s)
> Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4
>   Operations: start interval=0s timeout=60
> (DRBD_global_clst-start-interval-0s)
>               stop interval=0s timeout=60
> (DRBD_global_clst-stop-interval-0s)
>               monitor interval=20 timeout=40
> (DRBD_global_clst-monitor-interval-20)
>
> Stonith Devices:
> Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
>   Operations: monitor interval=60s (ipmi-fence-clst1-monitor-interval-60s)
> Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
>   Attributes: lanplus=1 login=foo passwd=bar action=reboot
> ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list
> pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
>   Operations: monitor interval=60s (ipmi-fence-clst2-monitor-interval-60s)
> Fencing Levels:
>
> Location Constraints:
>   Resource: ipmi-fence-clst1
>     Disabled on: ventsi-clst1-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
>   Resource: ipmi-fence-clst2
>     Disabled on: ventsi-clst2-sync (score:-INFINITY)
> (id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
> Ordering Constraints:
>   start IPaddrNFS then start NFSServer (kind:Mandatory)
> (id:order-IPaddrNFS-NFSServer-mandatory)
>   promote DRBDClone then start DRBD_global_clst (kind:Mandatory)
> (id:order-DRBDClone-DRBD_global_clst-mandatory)
>   start DRBD_global_clst then start IPaddrNFS (kind:Mandatory)
> (id:order-DRBD_global_clst-IPaddrNFS-mandatory)
> Colocation Constraints:
>   NFSServer with IPaddrNFS (score:INFINITY)
> (id:colocation-NFSServer-IPaddrNFS-INFINITY)
>   DRBD_global_clst with DRBDClone (score:INFINITY)
> (id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
It took me a while to notice it, it's easily overlooked, but the above
constraint is the problem. It says DRBD_global_clst must be located
where DRBDClone is running ... not necessarily where DRBDClone is
master. This constraint should be created like this:
pcs constraint colocation add DRBD_global_clst with master DBRDClone
>   IPaddrNFS with DRBD_global_clst (score:INFINITY)
> (id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)
>
> Resources Defaults:
> resource-stickiness: INFINITY
> Operations Defaults:
> timeout: 10s
>
> Cluster Properties:
> cluster-infrastructure: cman
> dc-version: 1.1.14-8.el6-70404b0
> have-watchdog: false
> last-lrm-refresh: 1478277432
> no-quorum-policy: ignore
> stonith-enabled: true
> symmetric-cluster: true
> ==================================================================
>
> Initial state is e.g. this (all resources at node1):
>
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
> Full list of resources:
>
> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> ventsi-clst2-sync
> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> ventsi-clst1-sync
> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
> NFSServer      (ocf::heartbeat:nfsserver):     Started ventsi-clst1-sync
> Master/Slave Set: DRBDClone [DRBD]
>      Masters: [ ventsi-clst1-sync ]
>      Slaves: [ ventsi-clst2-sync ]
> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started
> ventsi-clst1-sync
> ==================================================================
>
> If I shutdown the cluster at node 1 ('pcs cluster stop') or if I move
> the DRBD clone resource ('pcs resource move DRBDClone') all resources
> switch successfully to node2.
> I.e. the demote/promote of the DRBD clone resource is working in these
> cases.
>
> But if I try to move any other resource (e.g. 'pcs resource move
> NFSServer') the resources NFSServer, IPaddrNFS and DRBD_global_clst are
> stopped at node 1, but then already follows starting of the
> DRBD_global_clst resource at node2, which fails due to the missing
> demote/promote.
> As far as I can see there is some follow-up attempt to repair things
> partially as the resources are started again at node1 exclusive the
> resource which I moved due to my move command.
>
> Final state is like this:
>
> Online: [ ventsi-clst1-sync ventsi-clst2-sync ]
>
> Full list of resources:
>
> ipmi-fence-clst1       (stonith:fence_ipmilan):        Started
> ventsi-clst2-sync
> ipmi-fence-clst2       (stonith:fence_ipmilan):        Started
> ventsi-clst1-sync
> IPaddrNFS      (ocf::heartbeat:IPaddr2):       Started ventsi-clst1-sync
> NFSServer      (ocf::heartbeat:nfsserver):     Stopped
> Master/Slave Set: DRBDClone [DRBD]
>      Masters: [ ventsi-clst1-sync ]
>      Slaves: [ ventsi-clst2-sync ]
> DRBD_global_clst       (ocf::heartbeat:Filesystem):    Started
> ventsi-clst1-sync
>
> Failed Actions:
> * DRBD_global_clst_start_0 on ventsi-clst2-sync 'unknown error' (1):
> call=778, status=complete, exitreason='none',
>     last-rc-change='Fri Nov  4 19:32:56 2016', queued=0ms, exec=43ms
> ==================================================================
>
> Here are the logged messages for this "failure case":
>
> 2016-11-04T19:32:55.163982+01:00 ventsi-clst1 crmd[6116]:   notice:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> 2016-11-04T19:32:55.168100+01:00 ventsi-clst1 pengine[6115]:   notice:
> On loss of CCM Quorum: Ignore
> 2016-11-04T19:32:55.181252+01:00 ventsi-clst1 pengine[6115]:   notice:
> Move    IPaddrNFS#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
> 2016-11-04T19:32:55.181260+01:00 ventsi-clst1 pengine[6115]:   notice:
> Move    NFSServer#011(Started ventsi-clst1-sync -> ventsi-clst2-sync)
> 2016-11-04T19:32:55.181278+01:00 ventsi-clst1 pengine[6115]:   notice:
> Move    DRBD_global_clst#011(Started ventsi-clst1-sync ->
> ventsi-clst2-sync)  <=== here no demote/promote is listed
> 2016-11-04T19:32:55.182385+01:00 ventsi-clst1 pengine[6115]:   notice:
> Calculated Transition 202: /var/lib/pacemaker/pengine/pe-input-766.bz2
> 2016-11-04T19:32:55.182998+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 15: stop NFSServer_stop_0 on ventsi-clst1-sync (local)
> 2016-11-04T19:32:55.196265+01:00 ventsi-clst1
> nfsserver(NFSServer)[15978]: INFO: Stopping NFS server ...
> 2016-11-04T19:32:55.249137+01:00 ventsi-clst1 kernel: nfsd: last server
> has exited, flushing export cache
> 2016-11-04T19:32:55.252241+01:00 ventsi-clst1 rpc.mountd[15282]: Caught
> signal 15, un-registering and exiting.
> 2016-11-04T19:32:55.632708+01:00 ventsi-clst1
> nfsserver(NFSServer)[15978]: INFO: Stopping sm-notify
> 2016-11-04T19:32:55.650552+01:00 ventsi-clst1
> nfsserver(NFSServer)[15978]: INFO: Stopping rpc.statd
> 2016-11-04T19:32:55.666777+01:00 ventsi-clst1 rpc.statd[15243]: Caught
> signal 15, un-registering and exiting
> 2016-11-04T19:32:56.692819+01:00 ventsi-clst1
> nfsserver(NFSServer)[15978]: INFO: NFS server stopped
> 2016-11-04T19:32:56.695523+01:00 ventsi-clst1 crmd[6116]:   notice:
> Operation NFSServer_stop_0: ok (node=ventsi-clst1-sync, call=1220, rc=0,
> cib-update=1695, confirmed=true)
> 2016-11-04T19:32:56.696243+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 12: stop IPaddrNFS_stop_0 on ventsi-clst1-sync (local)
> 2016-11-04T19:32:56.727882+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16108]:
> INFO: IP status = ok, IP_CIP=
> 2016-11-04T19:32:56.733383+01:00 ventsi-clst1 crmd[6116]:   notice:
> Operation IPaddrNFS_stop_0: ok (node=ventsi-clst1-sync, call=1222, rc=0,
> cib-update=1696, confirmed=true)
> 2016-11-04T19:32:56.733917+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 48: stop DRBD_global_clst_stop_0 on ventsi-clst1-sync
> (local)
> 2016-11-04T19:32:56.757181+01:00 ventsi-clst1
> Filesystem(DRBD_global_clst)[16163]: INFO: Running stop for /dev/drbd1
> on /drbdmnts/global_clst
> 2016-11-04T19:32:56.764684+01:00 ventsi-clst1
> Filesystem(DRBD_global_clst)[16163]: INFO: Trying to unmount
> /drbdmnts/global_clst
> 2016-11-04T19:32:56.771260+01:00 ventsi-clst1
> Filesystem(DRBD_global_clst)[16163]: INFO: unmounted
> /drbdmnts/global_clst successfully
> 2016-11-04T19:32:56.776640+01:00 ventsi-clst1 crmd[6116]:   notice:
> Operation DRBD_global_clst_stop_0: ok (node=ventsi-clst1-sync,
> call=1224, rc=0, cib-update=1697, confirmed=true)
> 2016-11-04T19:32:56.777140+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 49: start DRBD_global_clst_start_0 on
> ventsi-clst2-sync   <=== hereis the attempt to start the filesystem at
> the other node, although DRBD has not yet been promoted
> 2016-11-04T19:32:56.840137+01:00 ventsi-clst1 crmd[6116]:  warning:
> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
> (target: 0 vs. rc: 1): Error
> 2016-11-04T19:32:56.840158+01:00 ventsi-clst1 crmd[6116]:   notice:
> Transition aborted by DRBD_global_clst_start_0 'modify' on
> ventsi-clst2-sync: Event failed
> (magic=0:1;49:202:0:b7941532-c74b-40cc-a8ad-27b5502b8fba, cib=0.649.4,
> source=match_graph_event:381, 0)
> 2016-11-04T19:32:56.840232+01:00 ventsi-clst1 crmd[6116]:  warning:
> Action 49 (DRBD_global_clst_start_0) on ventsi-clst2-sync failed
> (target: 0 vs. rc: 1): Error
> 2016-11-04T19:32:56.840328+01:00 ventsi-clst1 crmd[6116]:   notice:
> Transition 202 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=5,
> Source=/var/lib/pacemaker/pengine/pe-input-766.bz2): Complete
> 2016-11-04T19:32:56.843693+01:00 ventsi-clst1 pengine[6115]:   notice:
> On loss of CCM Quorum: Ignore
> 2016-11-04T19:32:56.844072+01:00 ventsi-clst1 pengine[6115]:  warning:
> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> unknown error (1)
> 2016-11-04T19:32:56.844102+01:00 ventsi-clst1 pengine[6115]:  warning:
> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> unknown error (1)
> 2016-11-04T19:32:56.845071+01:00 ventsi-clst1 pengine[6115]:   notice:
> Start   IPaddrNFS#011(ventsi-clst2-sync)
> 2016-11-04T19:32:56.845078+01:00 ventsi-clst1 pengine[6115]:   notice:
> Start   NFSServer#011(ventsi-clst2-sync)
> 2016-11-04T19:32:56.845081+01:00 ventsi-clst1 pengine[6115]:   notice:
> Demote  DRBD:0#011(Master -> Slave ventsi-clst1-sync)   <=== here there
> would be the necessarydemote/promote ... but it's too late; the start of
> the filesystem already failed...
> 2016-11-04T19:32:56.845083+01:00 ventsi-clst1 pengine[6115]:   notice:
> Promote DRBD:1#011(Slave -> Master ventsi-clst2-sync)
> 2016-11-04T19:32:56.845084+01:00 ventsi-clst1 pengine[6115]:   notice:
> Recover DRBD_global_clst#011(Started ventsi-clst2-sync)
> 2016-11-04T19:32:56.847986+01:00 ventsi-clst1 pengine[6115]:   notice:
> Calculated Transition 203: /var/lib/pacemaker/pengine/pe-input-767.bz2
> <=== ... so the above transition gets caught by thefollowing attempt to
> repair things partially
> 2016-11-04T19:32:56.867679+01:00 ventsi-clst1 pengine[6115]:   notice:
> On loss of CCM Quorum: Ignore
> 2016-11-04T19:32:56.868074+01:00 ventsi-clst1 pengine[6115]:  warning:
> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> unknown error (1)
> 2016-11-04T19:32:56.868101+01:00 ventsi-clst1 pengine[6115]:  warning:
> Processing failed op start for DRBD_global_clst on ventsi-clst2-sync:
> unknown error (1)
> 2016-11-04T19:32:56.868287+01:00 ventsi-clst1 pengine[6115]:  warning:
> Forcing DRBD_global_clst away from ventsi-clst2-sync after 1000000
> failures (max=1000000)
> 2016-11-04T19:32:56.869011+01:00 ventsi-clst1 pengine[6115]:   notice:
> Start   IPaddrNFS#011(ventsi-clst1-sync)
> 2016-11-04T19:32:56.869023+01:00 ventsi-clst1 pengine[6115]:   notice:
> Recover DRBD_global_clst#011(Started ventsi-clst2-sync -> ventsi-clst1-sync)
> 2016-11-04T19:32:56.869770+01:00 ventsi-clst1 pengine[6115]:   notice:
> Calculated Transition 204: /var/lib/pacemaker/pengine/pe-input-768.bz2
> 2016-11-04T19:32:56.870065+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 3: stop DRBD_global_clst_stop_0 on ventsi-clst2-sync
> 2016-11-04T19:32:56.908075+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 42: start DRBD_global_clst_start_0 on
> ventsi-clst1-sync (local)
> 2016-11-04T19:32:56.931072+01:00 ventsi-clst1
> Filesystem(DRBD_global_clst)[16242]: INFO: Running start for /dev/drbd1
> on /drbdmnts/global_clst
> 2016-11-04T19:32:56.943250+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
> warning: maximal mount count reached, running e2fsck is recommended
> 2016-11-04T19:32:56.953253+01:00 ventsi-clst1 kernel: EXT4-fs (drbd1):
> mounted filesystem with ordered data mode. Opts:
> 2016-11-04T19:32:56.964284+01:00 ventsi-clst1 crmd[6116]:   notice:
> Operation DRBD_global_clst_start_0: ok (node=ventsi-clst1-sync,
> call=1225, rc=0, cib-update=1701, confirmed=true)
> 2016-11-04T19:32:56.965104+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 10: start IPaddrNFS_start_0 on ventsi-clst1-sync (local)
> 2016-11-04T19:32:56.965325+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 43: monitor DRBD_global_clst_monitor_20000 on
> ventsi-clst1-sync (local)
> 2016-11-04T19:32:56.996235+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> INFO: Adding inet address xxx.xxx.xxx.xxx/24 with broadcast address
> xxx.xxx.xxx.255 to device bond0
> 2016-11-04T19:32:57.002059+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> INFO: Bringing device bond0 up
> 2016-11-04T19:32:57.008128+01:00 ventsi-clst1 IPaddr2(IPaddrNFS)[16308]:
> INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
> /var/run/resource-agents/send_arp-xxx.xxx.xxx.xxx bond0 xxx.xxx.xxx.xxx
> auto not_used not_used
> 2016-11-04T19:32:57.020159+01:00 ventsi-clst1 crmd[6116]:   notice:
> Operation IPaddrNFS_start_0: ok (node=ventsi-clst1-sync, call=1226,
> rc=0, cib-update=1703, confirmed=true)
> 2016-11-04T19:32:57.020901+01:00 ventsi-clst1 crmd[6116]:   notice:
> Initiating action 11: monitor IPaddrNFS_monitor_5000 on
> ventsi-clst1-sync (local)
> 2016-11-04T19:32:57.052231+01:00 ventsi-clst1 crmd[6116]:   notice:
> Transition 204 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-768.bz2): Complete
> 2016-11-04T19:32:57.052251+01:00 ventsi-clst1 crmd[6116]:   notice:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> ==================================================================
>
> Any ideas what could be the reason for this behavior?
> And how could this be fixed?
>
>
> (I already found several articles on the internet with the
> recommendation to have two separately configured monitor operations for
> the DRBD resource configured one for the master role and another one for
> the slave role.
> Already tried this to no avail.)
>
> Regards
> Andi
_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161109/b32404f4/attachment.htm>
    
    
More information about the Users
mailing list