[ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"
Salatiel Filho
salatiel.filho at gmail.com
Mon May 2 12:11:01 EDT 2022
Hi, Ken, here is the info you asked for.
# pcs constraint
Location Constraints:
Resource: fence-server1
Disabled on:
Node: server1 (score:-INFINITY)
Resource: fence-server2
Disabled on:
Node: server2 (score:-INFINITY)
Ordering Constraints:
promote DRBDData-clone then start nfs (kind:Mandatory)
Colocation Constraints:
nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:
# sudo crm_mon -1A
...
Node Attributes:
* Node: server2:
* master-DRBDData : 10000
Atenciosamente/Kind regards,
Salatiel
On Mon, May 2, 2022 at 12:26 PM Ken Gaillot <kgaillot at redhat.com> wrote:
>
> On Mon, 2022-05-02 at 09:58 -0300, Salatiel Filho wrote:
> > Hi, I am trying to understand the recovering process of a promotable
> > resource after "pcs cluster stop --all" and shutdown of both nodes.
> > I have a two nodes + qdevice quorum with a DRBD resource.
> >
> > This is a summary of the resources before my test. Everything is
> > working just fine and server2 is the master of DRBD.
> >
> > * fence-server1 (stonith:fence_vmware_rest): Started server2
> > * fence-server2 (stonith:fence_vmware_rest): Started server1
> > * Clone Set: DRBDData-clone [DRBDData] (promotable):
> > * Masters: [ server2 ]
> > * Slaves: [ server1 ]
> > * Resource Group: nfs:
> > * drbd_fs (ocf::heartbeat:Filesystem): Started server2
> >
> >
> >
> > then I issue "pcs cluster stop --all". The cluster will be stopped on
> > both nodes as expected.
> > Now I restart server1( previously the slave ) and poweroff server2 (
> > previously the master ). When server1 restarts it will fence server2
> > and I can see that server2 is starting on vcenter, but I just pressed
> > any key on grub to make sure the server2 would not restart, instead
> > it
> > would just be "paused" on grub screen.
> >
> > SSH'ing to server1 and running pcs status I get:
> >
> > Cluster name: cluster1
> > Cluster Summary:
> > * Stack: corosync
> > * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition
> > with quorum
> > * Last updated: Mon May 2 09:52:03 2022
> > * Last change: Mon May 2 09:39:22 2022 by root via cibadmin on
> > server1
> > * 2 nodes configured
> > * 11 resource instances configured
> >
> > Node List:
> > * Online: [ server1 ]
> > * OFFLINE: [ server2 ]
> >
> > Full List of Resources:
> > * fence-server1 (stonith:fence_vmware_rest): Stopped
> > * fence-server2 (stonith:fence_vmware_rest): Started server1
> > * Clone Set: DRBDData-clone [DRBDData] (promotable):
> > * Slaves: [ server1 ]
> > * Stopped: [ server2 ]
> > * Resource Group: nfs:
> > * drbd_fs (ocf::heartbeat:Filesystem): Stopped
> >
> >
> > So I can see there is quorum, but the server1 is never promoted as
> > DRBD master, so the remaining resources will be stopped until server2
> > is back.
> > 1) What do I need to do to force the promotion and recover without
> > restarting server2?
> > 2) Why if instead of rebooting server1 and power off server2 I reboot
> > server2 and poweroff server1 the cluster can recover by itself?
> >
> >
> > Thanks!
> >
>
> You shouldn't need to force promotion, that is the default behavior in
> that situation. There must be something else in the configuration that
> is preventing promotion.
>
> The DRBD resource agent should set a promotion score for the node. You
> can run "crm_mon -1A" to show all node attributes; there should be one
> like "master-DRBDData" for the active node.
>
> You can also show the constraints in the cluster to see if there is
> anything relevant to the master role.
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list