[ClusterLabs] colocation/order for cloned resource + group being ignored
Salatiel Filho
salatiel.filho at gmail.com
Mon Apr 11 13:16:06 EDT 2022
On Mon, Apr 11, 2022 at 1:53 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>
> On 11.04.2022 19:02, Salatiel Filho wrote:
> > Hi, I am deploying pacemaker + drbd to provide a high availability
> > storage and during the troubleshooting tests I got an strange
> > behaviour where the colocation constraint for the remaining resources
> > and the cloned group appear to be just ignored.
> >
> > These are the constraints I have:
> > Location Constraints:
> > Ordering Constraints:
> > start DRBDData-clone then start nfs (kind:Mandatory)
> > Colocation Constraints:
> > nfs with DRBDData-clone (score:INFINITY)
> > Ticket Constraints:
> >
> >
> > The environment: I have a two node cluster with a remote quorum
> > device. The test was to stop the quorum device and afterwards stop the
> > node currently running all the services ( node1 ).
> > The expected behaviour would be that the remaining node would not be
> > able to do anything ( partition without-quorum ) until it gets quorum.
> > This is the output of pcs status on node2 after power off the quorum
> > device and the node1.
> >
> > Some resources have been removed from the output to make this email cleaner.
> >
> > Cluster name: storage-drbd
> > Cluster Summary:
> > * Stack: corosync
> > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition
> > WITHOUT quorum
> > * Last updated: Mon Apr 11 12:28:06 2022
> > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2
> > * 2 nodes configured
> > * 11 resource instances configured
> >
> > Node List:
> > * Node node1: UNCLEAN (offline)
> > * Online: [ node2 ]
> >
> > Full List of Resources:
> > * fence-node1 (stonith:fence_vmware_rest): Started node2
> > * fence-node2 (stonith:fence_vmware_rest): Started node1 (UNCLEAN)
> > * Clone Set: DRBDData-clone [DRBDData] (promotable):
> > * DRBDData (ocf::linbit:drbd): Master node1 (UNCLEAN)
> > * Slaves: [ node2 ]
> > * Resource Group: nfs:
> > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1 (UNCLEAN)
> > * drbd_fs (ocf::heartbeat:Filesystem): Started node1 (UNCLEAN)
> > * nfsd (ocf::heartbeat:nfsserver): Started node1 (UNCLEAN)
> >
> > Daemon Status:
> > corosync: active/enabled
> > pacemaker: active/enabled
> > pcsd: active/enabled
> >
> >
> >
> >
> >
> >
> > As expected, the node 2 is without quorum and waiting. The problem
> > hapenned when I turn the node1 back. The quorum was restablished, but
> > the drbd master started on node1, but the nfs group started on node2,
> > even though I have both start order and colocation to make both the
> > Cloned Resource and the NFS group to run on the same node.
> >
>
> No. you do not.
>
> >
> >
> > Cluster name: storage-drbd
> > Cluster Summary:
> > * Stack: corosync
> > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
> > * Last updated: Mon Apr 11 12:29:08 2022
> > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2
> > * 2 nodes configured
> > * 11 resource instances configured
> >
> > Node List:
> > * Online: [ node1 node2 ]
> >
> > Full List of Resources:
> > * fence-node1 (stonith:fence_vmware_rest): Started node2
> > * fence-node2 (stonith:fence_vmware_rest): Started node1
> > * Clone Set: DRBDData-clone [DRBDData] (promotable):
> > * Masters: [ node2 ]
> > * Slaves: [ node1 ]
> > * Resource Group: nfs:
> > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1
> > * drbd_fs (ocf::heartbeat:Filesystem): FAILED node1
> > * nfsd (ocf::heartbeat:nfsserver): Stopped
> >
> > Failed Resource Actions:
> > * drbd_fs_start_0 on node1 'error' (1): call=90, status='complete',
> > exitreason='Couldn't mount device [/dev/drbd0] as /exports/drbd0',
> > last-rc-change='2022-04-11 12:29:05 -03:00', queued=0ms, exec=2567ms
> >
> > Daemon Status:
> > corosync: active/enabled
> > pacemaker: active/enabled
> > pcsd: active/enabled
> >
> >
> >
> >
> >
> > Can anyone explain to me why are the constraints being ignored?
> >
>
> You order/colocation is against starting of clone resource, not against
> master. If you need to order/colocate resource against master, you need
> to say this explicitly. Colocating/ordering against "start" is satisfied
> as soon as cloned resource is started as slave, before it gets promoted.
Thanks Andrei, I suppose these are the required constraints then:
# pcs constraint order promote DRBDData-Clone then nfs
# pcs constraint colocation add nfs with master DRBDData-clone
INFINITY with-rsc-role=Master
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list