[ClusterLabs] What is the logic when two node are down at the same time and needs to be fenced

Mon Nov 7 15:59:08 UTC 2016

Hi Ken,

Thanks for the clarification. Now I have another real problem that needs 
your advise.

The cluster consists of 5 nodes and one of the node got a 1 second 
network failure which resulted in one of the VirtualDomain resources to 
start on two nodes at the same time. The cluster property 
no_quorum_policy is set to stop.

At 16:13:34, this happened:
16:13:34 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node 
zs93KLpcs1[5] - state is now lost (was member)
16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list[0] 
group:pacemakerd\x00, ip:r(0) ip(10.20.93.13) , pid:28721
16:13:34 zs95kj crmd[133002]: warning: No match for shutdown action on 5
16:13:34 zs95kj attrd[133000]:  notice: Removing all zs93KLpcs1 
attributes for attrd_peer_change_cb
16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list_entries:1
16:13:34 zs95kj crmd[133002]:  notice: Stonith/shutdown of zs93KLpcs1 
not matched
...
16:13:35 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node 
zs93KLpcs1[5] - state is now member (was (null))

 From the DC:
[root at zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3288.bz2 |grep 110187
  zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Started 
zs93KLpcs1     <----------This is the baseline that everything works normal

[root at zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3289.bz2 |grep 110187
  zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped 
<----------- Here the node zs93KLpcs1 lost it's network for 1 sec and 
resulted in this state.

[root at zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3290.bz2 |grep 110187
  zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped

[root at zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3291.bz2 |grep 110187
  zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped

 From the DC's pengine log, it has:
16:05:01 zs95kj pengine[133001]:  notice: Calculated Transition 238: 
/var/lib/pacemaker/pengine/pe-input-3288.bz2
...
16:13:41 zs95kj pengine[133001]:  notice: Start 
zs95kjg110187_res#011(zs90kppcs1)
...
16:13:41 zs95kj pengine[133001]:  notice: Calculated Transition 239: 
/var/lib/pacemaker/pengine/pe-input-3289.bz2

 From the DC's CRMD log, it has:
Sep  9 16:05:25 zs95kj crmd[133002]:  notice: Transition 238 
(Complete=48, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-3288.bz2): Complete
...
Sep  9 16:13:42 zs95kj crmd[133002]:  notice: Initiating action 752: 
start zs95kjg110187_res_start_0 on zs90kppcs1
...
Sep  9 16:13:56 zs95kj crmd[133002]:  notice: Transition 241 
(Complete=81, Pending=0, Fired=0, Skipped=172, Incomplete=341, 
Source=/var/lib/pacemaker/pengine/pe-input-3291.bz2): Stopped

Here I do not see any log about pe-input-3289.bz2 and pe-input-3290.bz2. 
Why is this?

 From the log on zs93KLpcs1 where guest 110187 was running, i do not see 
any message regarding stopping this resource after it lost its 
connection to the cluster.

Any ideas where to look for possible cause?

On 11/3/2016 1:02 AM, Ken Gaillot wrote:
> On 11/02/2016 11:17 AM, Niu Sibo wrote:
>> Hi all,
>>
>> I have a general question regarding the fence login in pacemaker.
>>
>> I have setup a three nodes cluster with Pacemaker 1.1.13 and cluster
>> property no_quorum_policy set to ignore. When two nodes lost their NIC
>> corosync is running on at the same time, it looks like the two nodes are
>> getting fenced one by one, even I have three fence devices defined for
>> each of the node.
>>
>> What should I be expecting in the case?
> It's probably coincidence that the fencing happens serially; there is
> nothing enforcing that for separate fence devices. There are many steps
> in a fencing request, so they can easily take different times to complete.
>
>> I noticed if the node rejoins the cluster before the cluster starts the
>> fence actions, some resources will get activated on 2 nodes at the
>> sametime. This is really not good if the resource happens to be
>> VirutalGuest.  Thanks for any suggestions.
> Since you're ignoring quorum, there's nothing stopping the disconnected
> node from starting all resources on its own. It can even fence the other
> nodes, unless the downed NIC is used for fencing. From that node's point
> of view, it's the other two nodes that are lost.
>
> Quorum is the only solution I know of to prevent that. Fencing will
> correct the situation, but it won't prevent it.
>
> See the votequorum(5) man page for various options that can affect how
> quorum is calculated. Also, the very latest version of corosync supports
> qdevice (a lightweight daemon that run on a host outside the cluster
> strictly for the purposes of quorum).
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>