[ClusterLabs] Pacemaker quorum behavior

Fri Sep 9 09:27:34 UTC 2016

On 09/08/2016 07:31 PM, Scott Greenlese wrote:
>
> Hi Klaus, thanks for your prompt and thoughtful feedback...
>
> Please see my answers nested below (sections entitled, "Scott's
> Reply"). Thanks!
>
> - Scott
>
>
> Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
> PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
>
>
> Inactive hide details for Klaus Wenninger ---09/08/2016 10:59:27
> AM---On 09/08/2016 03:55 PM, Scott Greenlese wrote: >Klaus Wenninger
> ---09/08/2016 10:59:27 AM---On 09/08/2016 03:55 PM, Scott Greenlese
> wrote: >
>
> From: Klaus Wenninger <kwenning at redhat.com>
> To: users at clusterlabs.org
> Date: 09/08/2016 10:59 AM
> Subject: Re: [ClusterLabs] Pacemaker quorum behavior
>
> ------------------------------------------------------------------------
>
>
>
> On 09/08/2016 03:55 PM, Scott Greenlese wrote:
> >
> > Hi all...
> >
> > I have a few very basic questions for the group.
> >
> > I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100
> > VirtualDomain pacemaker-remote nodes
> > plus 100 "opaque" VirtualDomain resources. The cluster is configured
> > to be 'symmetric' and I have no
> > location constraints on the 200 VirtualDomain resources (other than to
> > prevent the opaque guests
> > from running on the pacemaker remote node resources). My quorum is set
> > as:
> >
> > quorum {
> > provider: corosync_votequorum
> > }
> >
> > As an experiment, I powered down one LPAR in the cluster, leaving 4
> > powered up with the pcsd service up on the 4 survivors
> > but corosync/pacemaker down (pcs cluster stop --all) on the 4
> > survivors. I then started pacemaker/corosync on a single cluster
> >
>
> "pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but
> did you check the status of the individual services?
>
> Scott's reply:
>
> No, I only assumed that pacemaker was down because I got this back on
> my pcs status
> command from each cluster node:
>
> [root at zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1
> zs93kjpcs1 ; do ssh $host pcs status; done
> Wed Sep 7 15:49:27 EDT 2016
> Error: cluster is not currently running on this node
> Error: cluster is not currently running on this node
> Error: cluster is not currently running on this node
> Error: cluster is not currently running on this node
>  
>
> What else should I check?  The pcsd.service service was still up,
> since I didn't not stop that
> anywhere. Should I have done,  ps -ef |grep -e pacemaker -e corosync
>  to check the state before
> assuming it was really down?
>
>
Guess the answer from Poki should guide you well here ...
>
>
> > node (pcs cluster start), and this resulted in the 200 VirtualDomain
> > resources activating on the single node.
> > This was not what I was expecting. I assumed that no resources would
> > activate / start on any cluster nodes
> > until 3 out of the 5 total cluster nodes had pacemaker/corosync running.
> >
> > After starting pacemaker/corosync on the single host (zs95kjpcs1),
> > this is what I see :
> >
> > [root at zs95kj VD]# date;pcs status |less
> > Wed Sep 7 15:51:17 EDT 2016
> > Cluster name: test_cluster_2
> > Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12
> > 2016 by hacluster via crmd on zs93kjpcs1
> > Stack: corosync
> > Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> > partition with quorum
> > 106 nodes and 304 resources configured
> >
> > Node zs93KLpcs1: pending
> > Node zs93kjpcs1: pending
> > Node zs95KLpcs1: pending
> > Online: [ zs95kjpcs1 ]
> > OFFLINE: [ zs90kppcs1 ]
> >
> > .
> > .
> > .
> > PCSD Status:
> > zs93kjpcs1: Online
> > zs95kjpcs1: Online
> > zs95KLpcs1: Online
> > zs90kppcs1: Offline
> > zs93KLpcs1: Online
> >
> > So, what exactly constitutes an "Online" vs. "Offline" cluster node
> > w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3
> > nodes,
> > so where does that fall? Any why "pending"? What does that mean?
> >
> > Also, what exactly is the cluster's expected reaction to quorum loss?
> > Cluster resources will be stopped or something else?
> >
> Depends on how you configure it using cluster property no-quorum-policy
> (default: stop).
>
> Scott's reply:
>
> This is how the policy is configured:
>
> [root at zs95kj VD]# date;pcs config |grep quorum
> Thu Sep  8 13:18:33 EDT 2016
>  no-quorum-policy: stop
>
> What should I expect with the 'stop' setting?
>
>
> >
> >
> > Where can I find this documentation?
> >
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/
>
> Scott's reply:
>
> OK, I'll keep looking thru this doc, but I don't easily find the
> no-quorum-policy explained.
>
Well, the index leads you to:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-cluster-options.html
where you find an exhaustive description of the option.

In short:
you are running the default and that leads to all resources being
stopped in a partition without quorum

> Thanks..
>
>
> >
> >
> > Thanks!
> >
> > Scott Greenlese - IBM Solution Test Team.
> >
> >
> >
> > Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
> > INTERNET: swgreenl at us.ibm.com
> > PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160909/9b78900c/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160909/9b78900c/attachment-0002.gif>