Have you checked with drbd commands if the 2 nodes were in sync?<div><br><div><br></div><div>Also consider adding the shared dir, lvm,etc into a single group -> see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/s1-resourcegroupcreatenfs-haaa</div><div><br></div><div>Best Regards,</div><div>Strahil Nikolov<br><div> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Tue, May 3, 2022 at 0:25, Ken Gaillot</div><div><kgaillot@redhat.com> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> On Mon, 2022-05-02 at 13:11 -0300, Salatiel Filho wrote:<br clear="none">> Hi, Ken, here is the info you asked for.<br clear="none">> <br clear="none">> <br clear="none">> # pcs constraint<br clear="none">> Location Constraints:<br clear="none">>   Resource: fence-server1<br clear="none">>     Disabled on:<br clear="none">>       Node: server1 (score:-INFINITY)<br clear="none">>   Resource: fence-server2<br clear="none">>     Disabled on:<br clear="none">>       Node: server2 (score:-INFINITY)<br clear="none">> Ordering Constraints:<br clear="none">>   promote DRBDData-clone then start nfs (kind:Mandatory)<br clear="none">> Colocation Constraints:<br clear="none">>   nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)<br clear="none">> (with-rsc-role:Master)<br clear="none">> Ticket Constraints:<br clear="none">> <br clear="none">> # sudo crm_mon -1A<br clear="none">> ...<br clear="none">> Node Attributes:<br clear="none">>   * Node: server2:<br clear="none">>     * master-DRBDData                     : 10000<br clear="none"><br clear="none">In the scenario you described, only server1 is up. If there is no<br clear="none">master score for server1, it cannot be master. It's up the resource<br clear="none">agent to set it. I'm not familiar enough with that agent to know why it<br clear="none">might not.<div class="yqt4371512845" id="yqtfd83496"><br clear="none"><br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Atenciosamente/Kind regards,<br clear="none">> Salatiel<br clear="none">> <br clear="none">> On Mon, May 2, 2022 at 12:26 PM Ken Gaillot <<a shape="rect" ymailto="mailto:kgaillot@redhat.com" href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br clear="none">> wrote:<br clear="none">> > On Mon, 2022-05-02 at 09:58 -0300, Salatiel Filho wrote:<br clear="none">> > > Hi, I am trying to understand the recovering process of a<br clear="none">> > > promotable<br clear="none">> > > resource after "pcs cluster stop --all" and shutdown of both<br clear="none">> > > nodes.<br clear="none">> > > I have a two nodes + qdevice quorum with a DRBD resource.<br clear="none">> > > <br clear="none">> > > This is a summary of the resources before my test. Everything is<br clear="none">> > > working just fine and server2 is the master of DRBD.<br clear="none">> > > <br clear="none">> > >  * fence-server1    (stonith:fence_vmware_rest):     Started<br clear="none">> > > server2<br clear="none">> > >  * fence-server2    (stonith:fence_vmware_rest):     Started<br clear="none">> > > server1<br clear="none">> > >  * Clone Set: DRBDData-clone [DRBDData] (promotable):<br clear="none">> > >    * Masters: [ server2 ]<br clear="none">> > >    * Slaves: [ server1 ]<br clear="none">> > >  * Resource Group: nfs:<br clear="none">> > >    * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2<br clear="none">> > > <br clear="none">> > > <br clear="none">> > > <br clear="none">> > > then I issue "pcs cluster stop --all". The cluster will be<br clear="none">> > > stopped on<br clear="none">> > > both nodes as expected.<br clear="none">> > > Now I restart server1( previously the slave ) and poweroff<br clear="none">> > > server2 (<br clear="none">> > > previously the master ). When server1 restarts it will fence<br clear="none">> > > server2<br clear="none">> > > and I can see that server2 is starting on vcenter, but I just<br clear="none">> > > pressed<br clear="none">> > > any key on grub to make sure the server2 would not restart,<br clear="none">> > > instead<br clear="none">> > > it<br clear="none">> > > would just be "paused" on grub screen.<br clear="none">> > > <br clear="none">> > > SSH'ing to server1 and running pcs status I get:<br clear="none">> > > <br clear="none">> > > Cluster name: cluster1<br clear="none">> > > Cluster Summary:<br clear="none">> > >   * Stack: corosync<br clear="none">> > >   * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) -<br clear="none">> > > partition<br clear="none">> > > with quorum<br clear="none">> > >   * Last updated: Mon May  2 09:52:03 2022<br clear="none">> > >   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin<br clear="none">> > > on<br clear="none">> > > server1<br clear="none">> > >   * 2 nodes configured<br clear="none">> > >   * 11 resource instances configured<br clear="none">> > > <br clear="none">> > > Node List:<br clear="none">> > >   * Online: [ server1 ]<br clear="none">> > >   * OFFLINE: [ server2 ]<br clear="none">> > > <br clear="none">> > > Full List of Resources:<br clear="none">> > >   * fence-server1    (stonith:fence_vmware_rest):     Stopped<br clear="none">> > >   * fence-server2    (stonith:fence_vmware_rest):     Started<br clear="none">> > > server1<br clear="none">> > >   * Clone Set: DRBDData-clone [DRBDData] (promotable):<br clear="none">> > >     * Slaves: [ server1 ]<br clear="none">> > >     * Stopped: [ server2 ]<br clear="none">> > >   * Resource Group: nfs:<br clear="none">> > >     * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped<br clear="none">> > > <br clear="none">> > > <br clear="none">> > > So I can see there is quorum, but the server1 is never promoted<br clear="none">> > > as<br clear="none">> > > DRBD master, so the remaining resources will be stopped until<br clear="none">> > > server2<br clear="none">> > > is back.<br clear="none">> > > 1) What do I need to do to force the promotion and recover<br clear="none">> > > without<br clear="none">> > > restarting server2?<br clear="none">> > > 2) Why if instead of rebooting server1 and power off server2 I<br clear="none">> > > reboot<br clear="none">> > > server2 and poweroff server1 the cluster can recover by itself?<br clear="none">> > > <br clear="none">> > > <br clear="none">> > > Thanks!<br clear="none">> > > <br clear="none">> > <br clear="none">> > You shouldn't need to force promotion, that is the default behavior<br clear="none">> > in<br clear="none">> > that situation. There must be something else in the configuration<br clear="none">> > that<br clear="none">> > is preventing promotion.<br clear="none">> > <br clear="none">> > The DRBD resource agent should set a promotion score for the node.<br clear="none">> > You<br clear="none">> > can run "crm_mon -1A" to show all node attributes; there should be<br clear="none">> > one<br clear="none">> > like "master-DRBDData" for the active node.<br clear="none">> > <br clear="none">> > You can also show the constraints in the cluster to see if there is<br clear="none">> > anything relevant to the master role.<br clear="none"><br clear="none">-- <br clear="none">Ken Gaillot <<a shape="rect" ymailto="mailto:kgaillot@redhat.com" href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br clear="none"><br clear="none">_______________________________________________<br clear="none">Manage your subscription:<br clear="none"><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br clear="none"><br clear="none">ClusterLabs home: <a shape="rect" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br clear="none"></div> </div> </blockquote></div></div></div>