[ClusterLabs] colocation/order for cloned resource + group being ignored

Salatiel Filho salatiel.filho at gmail.com
Mon Apr 11 13:16:06 EDT 2022


On Mon, Apr 11, 2022 at 1:53 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>
> On 11.04.2022 19:02, Salatiel Filho wrote:
> > Hi, I am deploying pacemaker + drbd to provide a high availability
> > storage and during the troubleshooting tests I got an strange
> > behaviour where the colocation constraint for the remaining resources
> > and the cloned group appear to be just ignored.
> >
> > These are the constraints I have:
> > Location Constraints:
> > Ordering Constraints:
> >   start DRBDData-clone then start nfs (kind:Mandatory)
> > Colocation Constraints:
> >   nfs with DRBDData-clone (score:INFINITY)
> > Ticket Constraints:
> >
> >
> > The environment: I have a two node cluster with a remote quorum
> > device. The test was to stop the quorum device and afterwards stop the
> > node currently running all the services ( node1 ).
> > The expected behaviour would be that the remaining node would not be
> > able to do anything ( partition without-quorum ) until it gets quorum.
> > This is the output of pcs status on node2 after power off the quorum
> > device and the node1.
> >
> > Some resources have been removed from the output to make this email cleaner.
> >
> > Cluster name: storage-drbd
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition
> > WITHOUT quorum
> >   * Last updated: Mon Apr 11 12:28:06 2022
> >   * Last change:  Mon Apr 11 12:26:10 2022 by root via cibadmin on node2
> >   * 2 nodes configured
> >   * 11 resource instances configured
> >
> > Node List:
> >   * Node node1: UNCLEAN (offline)
> >   * Online: [ node2 ]
> >
> > Full List of Resources:
> >   * fence-node1  (stonith:fence_vmware_rest):     Started node2
> >   * fence-node2  (stonith:fence_vmware_rest):     Started node1 (UNCLEAN)
> >   * Clone Set: DRBDData-clone [DRBDData] (promotable):
> >     * DRBDData  (ocf::linbit:drbd):     Master node1 (UNCLEAN)
> >     * Slaves: [ node2 ]
> >   * Resource Group: nfs:
> >     * vip_nfs   (ocf::heartbeat:IPaddr2):        Started node1 (UNCLEAN)
> >     * drbd_fs   (ocf::heartbeat:Filesystem):     Started node1 (UNCLEAN)
> >     * nfsd    (ocf::heartbeat:nfsserver):     Started node1 (UNCLEAN)
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> >
> >
> >
> >
> >
> >
> > As expected, the node 2 is without quorum and waiting. The problem
> > hapenned  when I turn the node1 back. The quorum was restablished, but
> > the drbd master started on node1, but the nfs group started on node2,
> > even though I have both start order and colocation to make both the
> > Cloned Resource and the NFS group to run on the same node.
> >
>
> No. you do not.
>
> >
> >
> > Cluster name: storage-drbd
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
> >   * Last updated: Mon Apr 11 12:29:08 2022
> >   * Last change:  Mon Apr 11 12:26:10 2022 by root via cibadmin on node2
> >   * 2 nodes configured
> >   * 11 resource instances configured
> >
> > Node List:
> >   * Online: [ node1 node2 ]
> >
> > Full List of Resources:
> >   * fence-node1  (stonith:fence_vmware_rest):     Started node2
> >   * fence-node2  (stonith:fence_vmware_rest):     Started node1
> >   * Clone Set: DRBDData-clone [DRBDData] (promotable):
> >     * Masters: [ node2 ]
> >     * Slaves: [ node1 ]
> >   * Resource Group: nfs:
> >     * vip_nfs   (ocf::heartbeat:IPaddr2):        Started node1
> >     * drbd_fs   (ocf::heartbeat:Filesystem):     FAILED node1
> >     * nfsd    (ocf::heartbeat:nfsserver):     Stopped
> >
> > Failed Resource Actions:
> >   * drbd_fs_start_0 on node1 'error' (1): call=90, status='complete',
> > exitreason='Couldn't mount device [/dev/drbd0] as /exports/drbd0',
> > last-rc-change='2022-04-11 12:29:05 -03:00', queued=0ms, exec=2567ms
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> >
> >
> >
> >
> >
> > Can anyone explain to me why are the constraints being ignored?
> >
>
> You order/colocation is against starting of clone resource, not against
> master. If you need to order/colocate resource against master, you need
> to say this explicitly. Colocating/ordering against "start" is satisfied
> as soon as cloned resource is started as slave, before it gets promoted.


Thanks Andrei, I suppose these are the required constraints then:

# pcs constraint order promote DRBDData-Clone then nfs
# pcs constraint colocation add  nfs with master DRBDData-clone
INFINITY with-rsc-role=Master

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list