[ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

Salatiel Filho salatiel.filho at gmail.com
Mon May 2 08:58:40 EDT 2022


Hi, I am trying to understand the recovering process of a promotable
resource after "pcs cluster stop --all" and shutdown of both nodes.
I have a two nodes + qdevice quorum with a DRBD resource.

This is a summary of the resources before my test. Everything is
working just fine and server2 is the master of DRBD.

 * fence-server1    (stonith:fence_vmware_rest):     Started server2
 * fence-server2    (stonith:fence_vmware_rest):     Started server1
 * Clone Set: DRBDData-clone [DRBDData] (promotable):
   * Masters: [ server2 ]
   * Slaves: [ server1 ]
 * Resource Group: nfs:
   * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2



then I issue "pcs cluster stop --all". The cluster will be stopped on
both nodes as expected.
Now I restart server1( previously the slave ) and poweroff server2 (
previously the master ). When server1 restarts it will fence server2
and I can see that server2 is starting on vcenter, but I just pressed
any key on grub to make sure the server2 would not restart, instead it
would just be "paused" on grub screen.

SSH'ing to server1 and running pcs status I get:

Cluster name: cluster1
Cluster Summary:
  * Stack: corosync
  * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon May  2 09:52:03 2022
  * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin on server1
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Online: [ server1 ]
  * OFFLINE: [ server2 ]

Full List of Resources:
  * fence-server1    (stonith:fence_vmware_rest):     Stopped
  * fence-server2    (stonith:fence_vmware_rest):     Started server1
  * Clone Set: DRBDData-clone [DRBDData] (promotable):
    * Slaves: [ server1 ]
    * Stopped: [ server2 ]
  * Resource Group: nfs:
    * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped


So I can see there is quorum, but the server1 is never promoted as
DRBD master, so the remaining resources will be stopped until server2
is back.
1) What do I need to do to force the promotion and recover without
restarting server2?
2) Why if instead of rebooting server1 and power off server2 I reboot
server2 and poweroff server1 the cluster can recover by itself?


Thanks!




Atenciosamente/Kind regards,
Salatiel


More information about the Users mailing list