[ClusterLabs] Antw: Re: crmd: notice: peer_update_callback: Node return implies stonith of node1 (action 34) completed
andrew at beekhof.net
Sun Apr 12 23:26:04 EDT 2015
> On 30 Mar 2015, at 6:36 pm, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>> Andrew Beekhof <andrew at beekhof.net> schrieb am 30.03.2015 um 02:30 in Nachricht
> <C19EFA41-B4F5-4EAF-8CEC-590ACCF8FAE2 at beekhof.net>:
>>> On 11 Mar 2015, at 5:48 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>>> I was looking why node did takeover of another node and stumbled on
>>> this message. The sequence was
>>> - loss of LAN connection for ~ 30 seconds
>>> - split brain
>>> - node initiated IPMI stonith
>>> - *BEFORE* IPMI stonith returned LAN connection was back and both
>>> nodes saw each other
>>> - and crmd assumed stonith worked
>>> Is it intentional?
>> Pretty much.
>>> Node node was actually rebooted by IPMI after this.
>> Not ideal but also nothing much we can do about it.
> Actually we were experiencing similar situations that were handled sub-optimal IMHO:
> The cluster decided to fence some node while it had no quorum. The node to be fenced was down anyway, but joined the cluster after reboot. THEN the cluster had quorum and fenced the node that just joined the cluster, causing a loss of quorum again...
> I feel that a node freshly joining the cluster should cancel all fencing requests targeted at it.
Normally it will. Logs?
His case was different in that the node came back after the fencing agent had already been called.
A bit late to put the cat back in the bag at that point.
>> Users mailing list: Users at clusterlabs.org
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users