<html><body><p>Hi Klaus, thanks for your prompt and thoughtful feedback... <br><br>Please see my answers nested below (sections entitled, "Scott's Reply"). Thanks!<br><br>- Scott<br><br><br>Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.<br> INTERNET: swgreenl@us.ibm.com <br> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966<br><br><br><img width="16" height="16" src="cid:1__=8FBB0ABBDFCD1B3B8f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for Klaus Wenninger ---09/08/2016 10:59:27 AM---On 09/08/2016 03:55 PM, Scott Greenlese wrote: >"><font color="#424282">Klaus Wenninger ---09/08/2016 10:59:27 AM---On 09/08/2016 03:55 PM, Scott Greenlese wrote: ></font><br><br><font size="2" color="#5F5F5F">From: </font><font size="2">Klaus Wenninger <kwenning@redhat.com></font><br><font size="2" color="#5F5F5F">To: </font><font size="2">users@clusterlabs.org</font><br><font size="2" color="#5F5F5F">Date: </font><font size="2">09/08/2016 10:59 AM</font><br><font size="2" color="#5F5F5F">Subject: </font><font size="2">Re: [ClusterLabs] Pacemaker quorum behavior</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><tt>On 09/08/2016 03:55 PM, Scott Greenlese wrote:<br>><br>> Hi all...<br>><br>> I have a few very basic questions for the group.<br>><br>> I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100<br>> VirtualDomain pacemaker-remote nodes<br>> plus 100 "opaque" VirtualDomain resources. The cluster is configured<br>> to be 'symmetric' and I have no<br>> location constraints on the 200 VirtualDomain resources (other than to<br>> prevent the opaque guests<br>> from running on the pacemaker remote node resources). My quorum is set<br>> as:<br>><br>> quorum {<br>> provider: corosync_votequorum<br>> }<br>><br>> As an experiment, I powered down one LPAR in the cluster, leaving 4<br>> powered up with the pcsd service up on the 4 survivors<br>> but corosync/pacemaker down (pcs cluster stop --all) on the 4<br>> survivors. I then started pacemaker/corosync on a single cluster<br>><br><br>"pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but<br>did you check the status of the individual services?</tt><br><br><tt>Scott's reply: </tt><br><br><tt>No, I only assumed that pacemaker was down because I got this back on my pcs status</tt><br><tt>command from each cluster node:</tt><br><br>[root@zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1 zs93kjpcs1 ; do ssh $host pcs status; done<br>Wed Sep 7 15:49:27 EDT 2016<br>Error: cluster is not currently running on this node<br>Error: cluster is not currently running on this node<br>Error: cluster is not currently running on this node<br>Error: cluster is not currently running on this node<br><tt> </tt><br><br><tt>What else should I check? The pcsd.service service was still up, since I didn't not stop that</tt><br><tt>anywhere. Should I have done, ps -ef |grep -e pacemaker -e corosync to check the state before</tt><br><tt>assuming it was really down? </tt><br><br><br><tt><br><br>> node (pcs cluster start), and this resulted in the 200 VirtualDomain<br>> resources activating on the single node.<br>> This was not what I was expecting. I assumed that no resources would<br>> activate / start on any cluster nodes<br>> until 3 out of the 5 total cluster nodes had pacemaker/corosync running.<br>><br>> After starting pacemaker/corosync on the single host (zs95kjpcs1),<br>> this is what I see :<br>><br>> [root@zs95kj VD]# date;pcs status |less<br>> Wed Sep 7 15:51:17 EDT 2016<br>> Cluster name: test_cluster_2<br>> Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12<br>> 2016 by hacluster via crmd on zs93kjpcs1<br>> Stack: corosync<br>> Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -<br>> partition with quorum<br>> 106 nodes and 304 resources configured<br>><br>> Node zs93KLpcs1: pending<br>> Node zs93kjpcs1: pending<br>> Node zs95KLpcs1: pending<br>> Online: [ zs95kjpcs1 ]<br>> OFFLINE: [ zs90kppcs1 ]<br>><br>> .<br>> .<br>> .<br>> PCSD Status:<br>> zs93kjpcs1: Online<br>> zs95kjpcs1: Online<br>> zs95KLpcs1: Online<br>> zs90kppcs1: Offline<br>> zs93KLpcs1: Online<br>><br>> So, what exactly constitutes an "Online" vs. "Offline" cluster node<br>> w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3<br>> nodes,<br>> so where does that fall? Any why "pending"? What does that mean?<br>><br>> Also, what exactly is the cluster's expected reaction to quorum loss?<br>> Cluster resources will be stopped or something else?<br>><br>Depends on how you configure it using cluster property no-quorum-policy<br>(default: stop).</tt><br><br><tt>Scott's reply: </tt><br><br><tt>This is how the policy is configured:</tt><br><br><tt>[root@zs95kj VD]# date;pcs config |grep quorum</tt><br><tt>Thu Sep 8 13:18:33 EDT 2016</tt><br><tt> no-quorum-policy: stop</tt><br><br><tt>What should I expect with the 'stop' setting? </tt><br><br><tt><br>><br>><br>> Where can I find this documentation?<br>><br></tt><tt><a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/">http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/</a></tt><br><br><tt>Scott's reply: </tt><br><br><tt>OK, I'll keep looking thru this doc, but I don't easily find the no-quorum-policy explained. </tt><br><br><tt>Thanks.. </tt><br><br><tt><br>><br>><br>> Thanks!<br>><br>> Scott Greenlese - IBM Solution Test Team.<br>><br>><br>><br>> Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.<br>> INTERNET: swgreenl@us.ibm.com<br>> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966<br>><br>><br>><br>> _______________________________________________<br>> Users mailing list: Users@clusterlabs.org<br>> </tt><tt><a href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a></tt><tt><br>><br>> Project Home: </tt><tt><a href="http://www.clusterlabs.org">http://www.clusterlabs.org</a></tt><tt><br>> Getting started: </tt><tt><a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a></tt><tt><br>> Bugs: </tt><tt><a href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a></tt><tt><br><br><br><br>_______________________________________________<br>Users mailing list: Users@clusterlabs.org<br></tt><tt><a href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a></tt><tt><br><br>Project Home: </tt><tt><a href="http://www.clusterlabs.org">http://www.clusterlabs.org</a></tt><tt><br>Getting started: </tt><tt><a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a></tt><tt><br>Bugs: </tt><tt><a href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a></tt><tt><br><br></tt><br><br><BR>
</body></html>