[ClusterLabs] Antw: Re: What is the logic when two node are down at the same time and needs to be fenced

Tue Nov 8 09:14:14 EST 2016

On 11/8/2016 5:08 PM, Ulrich Windl wrote:
>>>> Niu Sibo <niusibo at linux.vnet.ibm.com> schrieb am 07.11.2016 um 16:59 in
> Nachricht <5820A4CC.9030001 at linux.vnet.ibm.com>:
>> Hi Ken,
>>
>> Thanks for the clarification. Now I have another real problem that needs
>> your advise.
>>
>> The cluster consists of 5 nodes and one of the node got a 1 second
>> network failure which resulted in one of the VirtualDomain resources to
>> start on two nodes at the same time. The cluster property
>> no_quorum_policy is set to stop.
>>
>> At 16:13:34, this happened:
>> 16:13:34 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node
>> zs93KLpcs1[5] - state is now lost (was member)
>> 16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list[0]
>> group:pacemakerd\x00, ip:r(0) ip(10.20.93.13) , pid:28721
>> 16:13:34 zs95kj crmd[133002]: warning: No match for shutdown action on 5
> Usually the node would be fenced now. In the meantime the node might _try_ to stop the resources.
In my case, this 1 sec network lost of zs93KLpcs1 didn't result in a  
fence action on zs93KLpcs1.

Which node might  _try_ to stop the resource? The DC or the node that 
lost the connection to the cluster?
>
>> 16:13:34 zs95kj attrd[133000]:  notice: Removing all zs93KLpcs1
>> attributes for attrd_peer_change_cb
>> 16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list_entries:1
>> 16:13:34 zs95kj crmd[133002]:  notice: Stonith/shutdown of zs93KLpcs1
>> not matched
>> ...
>> 16:13:35 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node
>> zs93KLpcs1[5] - state is now member (was (null))
> Where are the logs from the other node? I don't see where resources are _started_.

 From the log of zs93kl where 110187 guest is started:

11:41:19 [28727] zs93kl       crmd: (       lrm.c:2392  )  notice: 
process_lrm_event:    Operation zs95kjg110187_res_start_0: ok 
(node=zs93KLpcs1, call=1249, rc=0, cib-update=837, confirmed=true)

This guest has been running on zs93KL since 11:41 until 16:13:34 when 
zs93KL lost the connection to the cluster for 1 second.  After this 
short break, the DC decides to start this guest on another node zs90kp 
which you can see from the log below. However, on zs93KL, it still has 
the following log:

16:14:17 [180373] zs93kl       crmd: (     utils.c:1942  )   debug: 
create_operation_update:     do_update_resource: Updating resource 
zs95kjg110187_res after monitor op complete (interval=0)
16:14:17 [180373] zs93kl       crmd: (       lrm.c:2392  )  notice: 
process_lrm_event:   Operation zs95kjg110187_res_monitor_0: ok 
(node=zs93KLpcs1, call=1655, rc=0, cib-update=216, confirmed=true)
16:14:17 [180373] zs93kl       crmd: (       lrm.c:196   )   debug: 
update_history_cache:        Updating history for 'zs95kjg110187_res' 
with monitor op

>>   From the DC:
>> [root at zs95kj ~]# crm_simulate --xml-file
>> /var/lib/pacemaker/pengine/pe-input-3288.bz2 |grep 110187
>>    zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Started
>> zs93KLpcs1     <----------This is the baseline that everything works normal
>>
>> [root at zs95kj ~]# crm_simulate --xml-file
>> /var/lib/pacemaker/pengine/pe-input-3289.bz2 |grep 110187
>>    zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped
>> <----------- Here the node zs93KLpcs1 lost it's network for 1 sec and
>> resulted in this state.
>>
>> [root at zs95kj ~]# crm_simulate --xml-file
>> /var/lib/pacemaker/pengine/pe-input-3290.bz2 |grep 110187
>>    zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped
>>
>> [root at zs95kj ~]# crm_simulate --xml-file
>> /var/lib/pacemaker/pengine/pe-input-3291.bz2 |grep 110187
>>    zs95kjg110187_res      (ocf::heartbeat:VirtualDomain): Stopped
>>
>>
>>   From the DC's pengine log, it has:
>> 16:05:01 zs95kj pengine[133001]:  notice: Calculated Transition 238:
>> /var/lib/pacemaker/pengine/pe-input-3288.bz2
>> ...
>> 16:13:41 zs95kj pengine[133001]:  notice: Start
>> zs95kjg110187_res#011(zs90kppcs1)
>> ...
>> 16:13:41 zs95kj pengine[133001]:  notice: Calculated Transition 239:
>> /var/lib/pacemaker/pengine/pe-input-3289.bz2
>>
>>   From the DC's CRMD log, it has:
>> Sep  9 16:05:25 zs95kj crmd[133002]:  notice: Transition 238
>> (Complete=48, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-3288.bz2): Complete
>> ...
>> Sep  9 16:13:42 zs95kj crmd[133002]:  notice: Initiating action 752:
>> start zs95kjg110187_res_start_0 on zs90kppcs1
>> ...
>> Sep  9 16:13:56 zs95kj crmd[133002]:  notice: Transition 241
>> (Complete=81, Pending=0, Fired=0, Skipped=172, Incomplete=341,
>> Source=/var/lib/pacemaker/pengine/pe-input-3291.bz2): Stopped
>>
>> Here I do not see any log about pe-input-3289.bz2 and pe-input-3290.bz2.
>> Why is this?
>>
>>   From the log on zs93KLpcs1 where guest 110187 was running, i do not see
>> any message regarding stopping this resource after it lost its
>> connection to the cluster.
>>
>> Any ideas where to look for possible cause?
>>
>> On 11/3/2016 1:02 AM, Ken Gaillot wrote:
>>> On 11/02/2016 11:17 AM, Niu Sibo wrote:
>>>> Hi all,
>>>>
>>>> I have a general question regarding the fence login in pacemaker.
>>>>
>>>> I have setup a three nodes cluster with Pacemaker 1.1.13 and cluster
>>>> property no_quorum_policy set to ignore. When two nodes lost their NIC
>>>> corosync is running on at the same time, it looks like the two nodes are
>>>> getting fenced one by one, even I have three fence devices defined for
>>>> each of the node.
>>>>
>>>> What should I be expecting in the case?
>>> It's probably coincidence that the fencing happens serially; there is
>>> nothing enforcing that for separate fence devices. There are many steps
>>> in a fencing request, so they can easily take different times to complete.
>>>
>>>> I noticed if the node rejoins the cluster before the cluster starts the
>>>> fence actions, some resources will get activated on 2 nodes at the
>>>> sametime. This is really not good if the resource happens to be
>>>> VirutalGuest.  Thanks for any suggestions.
>>> Since you're ignoring quorum, there's nothing stopping the disconnected
>>> node from starting all resources on its own. It can even fence the other
>>> nodes, unless the downed NIC is used for fencing. From that node's point
>>> of view, it's the other two nodes that are lost.
>>>
>>> Quorum is the only solution I know of to prevent that. Fencing will
>>> correct the situation, but it won't prevent it.
>>>
>>> See the votequorum(5) man page for various options that can affect how
>>> quorum is calculated. Also, the very latest version of corosync supports
>>> qdevice (a lightweight daemon that run on a host outside the cluster
>>> strictly for the purposes of quorum).
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>