[ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

Mon May 2 12:11:01 EDT 2022

Hi, Ken, here is the info you asked for.

# pcs constraint
Location Constraints:
  Resource: fence-server1
    Disabled on:
      Node: server1 (score:-INFINITY)
  Resource: fence-server2
    Disabled on:
      Node: server2 (score:-INFINITY)
Ordering Constraints:
  promote DRBDData-clone then start nfs (kind:Mandatory)
Colocation Constraints:
  nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:

# sudo crm_mon -1A
...
Node Attributes:
  * Node: server2:
    * master-DRBDData                     : 10000

Atenciosamente/Kind regards,
Salatiel

On Mon, May 2, 2022 at 12:26 PM Ken Gaillot <kgaillot at redhat.com> wrote:
>
> On Mon, 2022-05-02 at 09:58 -0300, Salatiel Filho wrote:
> > Hi, I am trying to understand the recovering process of a promotable
> > resource after "pcs cluster stop --all" and shutdown of both nodes.
> > I have a two nodes + qdevice quorum with a DRBD resource.
> >
> > This is a summary of the resources before my test. Everything is
> > working just fine and server2 is the master of DRBD.
> >
> >  * fence-server1    (stonith:fence_vmware_rest):     Started server2
> >  * fence-server2    (stonith:fence_vmware_rest):     Started server1
> >  * Clone Set: DRBDData-clone [DRBDData] (promotable):
> >    * Masters: [ server2 ]
> >    * Slaves: [ server1 ]
> >  * Resource Group: nfs:
> >    * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2
> >
> >
> >
> > then I issue "pcs cluster stop --all". The cluster will be stopped on
> > both nodes as expected.
> > Now I restart server1( previously the slave ) and poweroff server2 (
> > previously the master ). When server1 restarts it will fence server2
> > and I can see that server2 is starting on vcenter, but I just pressed
> > any key on grub to make sure the server2 would not restart, instead
> > it
> > would just be "paused" on grub screen.
> >
> > SSH'ing to server1 and running pcs status I get:
> >
> > Cluster name: cluster1
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition
> > with quorum
> >   * Last updated: Mon May  2 09:52:03 2022
> >   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin on
> > server1
> >   * 2 nodes configured
> >   * 11 resource instances configured
> >
> > Node List:
> >   * Online: [ server1 ]
> >   * OFFLINE: [ server2 ]
> >
> > Full List of Resources:
> >   * fence-server1    (stonith:fence_vmware_rest):     Stopped
> >   * fence-server2    (stonith:fence_vmware_rest):     Started server1
> >   * Clone Set: DRBDData-clone [DRBDData] (promotable):
> >     * Slaves: [ server1 ]
> >     * Stopped: [ server2 ]
> >   * Resource Group: nfs:
> >     * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped
> >
> >
> > So I can see there is quorum, but the server1 is never promoted as
> > DRBD master, so the remaining resources will be stopped until server2
> > is back.
> > 1) What do I need to do to force the promotion and recover without
> > restarting server2?
> > 2) Why if instead of rebooting server1 and power off server2 I reboot
> > server2 and poweroff server1 the cluster can recover by itself?
> >
> >
> > Thanks!
> >
>
> You shouldn't need to force promotion, that is the default behavior in
> that situation. There must be something else in the configuration that
> is preventing promotion.
>
> The DRBD resource agent should set a promotion score for the node. You
> can run "crm_mon -1A" to show all node attributes; there should be one
> like "master-DRBDData" for the active node.
>
> You can also show the constraints in the cluster to see if there is
> anything relevant to the master role.
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/