[ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"
Salatiel Filho
salatiel.filho at gmail.com
Mon May 2 08:58:40 EDT 2022
Hi, I am trying to understand the recovering process of a promotable
resource after "pcs cluster stop --all" and shutdown of both nodes.
I have a two nodes + qdevice quorum with a DRBD resource.
This is a summary of the resources before my test. Everything is
working just fine and server2 is the master of DRBD.
* fence-server1 (stonith:fence_vmware_rest): Started server2
* fence-server2 (stonith:fence_vmware_rest): Started server1
* Clone Set: DRBDData-clone [DRBDData] (promotable):
* Masters: [ server2 ]
* Slaves: [ server1 ]
* Resource Group: nfs:
* drbd_fs (ocf::heartbeat:Filesystem): Started server2
then I issue "pcs cluster stop --all". The cluster will be stopped on
both nodes as expected.
Now I restart server1( previously the slave ) and poweroff server2 (
previously the master ). When server1 restarts it will fence server2
and I can see that server2 is starting on vcenter, but I just pressed
any key on grub to make sure the server2 would not restart, instead it
would just be "paused" on grub screen.
SSH'ing to server1 and running pcs status I get:
Cluster name: cluster1
Cluster Summary:
* Stack: corosync
* Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon May 2 09:52:03 2022
* Last change: Mon May 2 09:39:22 2022 by root via cibadmin on server1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ server1 ]
* OFFLINE: [ server2 ]
Full List of Resources:
* fence-server1 (stonith:fence_vmware_rest): Stopped
* fence-server2 (stonith:fence_vmware_rest): Started server1
* Clone Set: DRBDData-clone [DRBDData] (promotable):
* Slaves: [ server1 ]
* Stopped: [ server2 ]
* Resource Group: nfs:
* drbd_fs (ocf::heartbeat:Filesystem): Stopped
So I can see there is quorum, but the server1 is never promoted as
DRBD master, so the remaining resources will be stopped until server2
is back.
1) What do I need to do to force the promotion and recover without
restarting server2?
2) Why if instead of rebooting server1 and power off server2 I reboot
server2 and poweroff server1 the cluster can recover by itself?
Thanks!
Atenciosamente/Kind regards,
Salatiel
More information about the Users
mailing list