[ClusterLabs] Pacemaker quorum behavior

Thu Sep 8 14:20:33 UTC 2016

Correction...

When I stopped pacemaker/corosync on the four (powered on / active) cluster
node hosts,  I was having an issue with
the gentle method of stopping the cluster (pcs cluster stop --all), so I
ended up doing individual (pcs cluster kill <cluster_node>)
on each of the four cluster nodes.   I then had to stop the virtual domains
manually via 'virsh destroy <guestname>' on each host.
Perhaps there was some residual node status affecting my quorum?

Thanks...

Scott Greenlese ... IBM Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgreenl at us.ibm.com
  PHONE:  8/293-7301 (845-433-7301)    M/S:  POK 42HA/P966

From:	Scott Greenlese/Poughkeepsie/IBM at IBMUS
To:	users at clusterlabs.org
Cc:	Si Bo Niu <niusibo at cn.ibm.com>, Scott
            Loveland/Poughkeepsie/IBM at IBMUS, Michael
            Tebolt/Poughkeepsie/IBM at IBMUS
Date:	09/08/2016 10:01 AM
Subject:	[ClusterLabs] Pacemaker quorum behavior

Hi all...

I have a few very basic questions for the group.

I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100 VirtualDomain
pacemaker-remote nodes
plus 100 "opaque" VirtualDomain resources. The cluster is configured to be
'symmetric' and I have no
location constraints on the 200 VirtualDomain resources (other than to
prevent the opaque guests
from running on the pacemaker remote node resources). My quorum is set as:

quorum {
provider: corosync_votequorum
}

As an experiment, I powered down one LPAR in the cluster, leaving 4 powered
up with the pcsd service up on the 4 survivors
but corosync/pacemaker down (pcs cluster stop --all) on the 4 survivors. I
then started pacemaker/corosync on a single cluster
node (pcs cluster start), and this resulted in the 200 VirtualDomain
resources activating on the single node.
This was not what I was expecting. I assumed that no resources would
activate / start on any cluster nodes
until 3 out of the 5 total cluster nodes had pacemaker/corosync running.

After starting pacemaker/corosync on the single host (zs95kjpcs1), this is
what I see :

[root at zs95kj VD]# date;pcs status |less
Wed Sep 7 15:51:17 EDT 2016
Cluster name: test_cluster_2
Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12 2016
by hacluster via crmd on zs93kjpcs1
Stack: corosync
Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition
with quorum
106 nodes and 304 resources configured

Node zs93KLpcs1: pending
Node zs93kjpcs1: pending
Node zs95KLpcs1: pending
Online: [ zs95kjpcs1 ]
OFFLINE: [ zs90kppcs1 ]

.
.
.
PCSD Status:
zs93kjpcs1: Online
zs95kjpcs1: Online
zs95KLpcs1: Online
zs90kppcs1: Offline
zs93KLpcs1: Online

So, what exactly constitutes an "Online" vs. "Offline" cluster node w.r.t.
quorum calculation? Seems like in my case, it's "pending" on 3 nodes,
so where does that fall? Any why "pending"? What does that mean?

Also, what exactly is the cluster's expected reaction to quorum loss?
Cluster resources will be stopped or something else?

Where can I find this documentation?

Thanks!

Scott Greenlese - IBM Solution Test Team.

Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
INTERNET: swgreenl at us.ibm.com
PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160908/334a3a5e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160908/334a3a5e/attachment-0002.gif>