[ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

Ken Gaillot kgaillot at redhat.com
Mon May 2 11:26:01 EDT 2022


On Mon, 2022-05-02 at 09:58 -0300, Salatiel Filho wrote:
> Hi, I am trying to understand the recovering process of a promotable
> resource after "pcs cluster stop --all" and shutdown of both nodes.
> I have a two nodes + qdevice quorum with a DRBD resource.
> 
> This is a summary of the resources before my test. Everything is
> working just fine and server2 is the master of DRBD.
> 
>  * fence-server1    (stonith:fence_vmware_rest):     Started server2
>  * fence-server2    (stonith:fence_vmware_rest):     Started server1
>  * Clone Set: DRBDData-clone [DRBDData] (promotable):
>    * Masters: [ server2 ]
>    * Slaves: [ server1 ]
>  * Resource Group: nfs:
>    * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2
> 
> 
> 
> then I issue "pcs cluster stop --all". The cluster will be stopped on
> both nodes as expected.
> Now I restart server1( previously the slave ) and poweroff server2 (
> previously the master ). When server1 restarts it will fence server2
> and I can see that server2 is starting on vcenter, but I just pressed
> any key on grub to make sure the server2 would not restart, instead
> it
> would just be "paused" on grub screen.
> 
> SSH'ing to server1 and running pcs status I get:
> 
> Cluster name: cluster1
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition
> with quorum
>   * Last updated: Mon May  2 09:52:03 2022
>   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin on
> server1
>   * 2 nodes configured
>   * 11 resource instances configured
> 
> Node List:
>   * Online: [ server1 ]
>   * OFFLINE: [ server2 ]
> 
> Full List of Resources:
>   * fence-server1    (stonith:fence_vmware_rest):     Stopped
>   * fence-server2    (stonith:fence_vmware_rest):     Started server1
>   * Clone Set: DRBDData-clone [DRBDData] (promotable):
>     * Slaves: [ server1 ]
>     * Stopped: [ server2 ]
>   * Resource Group: nfs:
>     * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped
> 
> 
> So I can see there is quorum, but the server1 is never promoted as
> DRBD master, so the remaining resources will be stopped until server2
> is back.
> 1) What do I need to do to force the promotion and recover without
> restarting server2?
> 2) Why if instead of rebooting server1 and power off server2 I reboot
> server2 and poweroff server1 the cluster can recover by itself?
> 
> 
> Thanks!
> 

You shouldn't need to force promotion, that is the default behavior in
that situation. There must be something else in the configuration that
is preventing promotion.

The DRBD resource agent should set a promotion score for the node. You
can run "crm_mon -1A" to show all node attributes; there should be one
like "master-DRBDData" for the active node.

You can also show the constraints in the cluster to see if there is
anything relevant to the master role.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list