[ClusterLabs] Pacemaker quorum behavior

Thu Sep 8 17:31:16 UTC 2016

Hi Klaus, thanks for your prompt and thoughtful feedback...

Please see my answers nested below (sections entitled, "Scott's Reply").
Thanks!

- Scott

Scott Greenlese ... IBM Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgreenl at us.ibm.com
  PHONE:  8/293-7301 (845-433-7301)    M/S:  POK 42HA/P966

From:	Klaus Wenninger <kwenning at redhat.com>
To:	users at clusterlabs.org
Date:	09/08/2016 10:59 AM
Subject:	Re: [ClusterLabs] Pacemaker quorum behavior

On 09/08/2016 03:55 PM, Scott Greenlese wrote:
>
> Hi all...
>
> I have a few very basic questions for the group.
>
> I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100
> VirtualDomain pacemaker-remote nodes
> plus 100 "opaque" VirtualDomain resources. The cluster is configured
> to be 'symmetric' and I have no
> location constraints on the 200 VirtualDomain resources (other than to
> prevent the opaque guests
> from running on the pacemaker remote node resources). My quorum is set
> as:
>
> quorum {
> provider: corosync_votequorum
> }
>
> As an experiment, I powered down one LPAR in the cluster, leaving 4
> powered up with the pcsd service up on the 4 survivors
> but corosync/pacemaker down (pcs cluster stop --all) on the 4
> survivors. I then started pacemaker/corosync on a single cluster
>

"pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but
did you check the status of the individual services?

Scott's reply:

No, I only assumed that pacemaker was down because I got this back on my
pcs status
command from each cluster node:

[root at zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1
zs93kjpcs1 ; do ssh $host pcs status; done
Wed Sep  7 15:49:27 EDT 2016
Error: cluster is not currently running on this node
Error: cluster is not currently running on this node
Error: cluster is not currently running on this node
Error: cluster is not currently running on this node

What else should I check?  The pcsd.service service was still up, since I
didn't not stop that
anywhere. Should I have done,  ps -ef |grep -e pacemaker -e corosync  to
check the state before
assuming it was really down?

> node (pcs cluster start), and this resulted in the 200 VirtualDomain
> resources activating on the single node.
> This was not what I was expecting. I assumed that no resources would
> activate / start on any cluster nodes
> until 3 out of the 5 total cluster nodes had pacemaker/corosync running.
>
> After starting pacemaker/corosync on the single host (zs95kjpcs1),
> this is what I see :
>
> [root at zs95kj VD]# date;pcs status |less
> Wed Sep 7 15:51:17 EDT 2016
> Cluster name: test_cluster_2
> Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12
> 2016 by hacluster via crmd on zs93kjpcs1
> Stack: corosync
> Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 106 nodes and 304 resources configured
>
> Node zs93KLpcs1: pending
> Node zs93kjpcs1: pending
> Node zs95KLpcs1: pending
> Online: [ zs95kjpcs1 ]
> OFFLINE: [ zs90kppcs1 ]
>
> .
> .
> .
> PCSD Status:
> zs93kjpcs1: Online
> zs95kjpcs1: Online
> zs95KLpcs1: Online
> zs90kppcs1: Offline
> zs93KLpcs1: Online
>
> So, what exactly constitutes an "Online" vs. "Offline" cluster node
> w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3
> nodes,
> so where does that fall? Any why "pending"? What does that mean?
>
> Also, what exactly is the cluster's expected reaction to quorum loss?
> Cluster resources will be stopped or something else?
>
Depends on how you configure it using cluster property no-quorum-policy
(default: stop).

Scott's reply:

This is how the policy is configured:

[root at zs95kj VD]# date;pcs config |grep quorum
Thu Sep  8 13:18:33 EDT 2016
 no-quorum-policy: stop

What should I expect with the 'stop' setting?

>
>
> Where can I find this documentation?
>
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/

Scott's reply:

OK, I'll keep looking thru this doc, but I don't easily find the
no-quorum-policy explained.

Thanks..

>
>
> Thanks!
>
> Scott Greenlese - IBM Solution Test Team.
>
>
>
> Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160908/a339d163/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160908/a339d163/attachment-0004.gif>