[ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.
ygao at suse.com
Fri Dec 1 14:36:12 EST 2017
On 11/30/2017 06:48 PM, Andrei Borzenkov wrote:
> 30.11.2017 16:11, Klaus Wenninger пишет:
>> On 11/30/2017 01:41 PM, Ulrich Windl wrote:
>>>>>> "Gao,Yan" <ygao at suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht
>>> <e71afccc-06e3-97dd-c66a-1b4bac550c23 at suse.com>:
>>>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>>>>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>>>>> corosync and forcing STONITH pacemaker was not started after reboot.
>>>>> In logs I see during boot
>>>>> Nov 22 16:04:56 sapprod01s crmd: crit: We were allegedly
>>>>> just fenced by sapprod01p for sapprod01p
>>>>> Nov 22 16:04:56 sapprod01s pacemakerd: warning: The crmd
>>>>> process (3151) can no longer be respawned,
>>>>> Nov 22 16:04:56 sapprod01s pacemakerd: notice: Shutting down
>>>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>>>>> stonith with SBD always takes msgwait (at least, visually host is not
>>>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>>>>> and is up and running long before timeout expires.
>>> As msgwait was intended for the message to arrive, and not for the reboot time (I guess), this just shows a fundamental problem in SBD design: Receipt of the fencing command is not confirmed (other than by seeing the consequences of ist execution).
>> The 2 x msgwait is not for confirmations but for writing the poison-pill
>> and for
>> having it read by the target-side.
> Yes, of course, but that's not what Urlich likely intended to say.
> msgwait must account for worst case storage path latency, while in
> normal cases it happens much faster. If fenced node could acknowledge
> having been killed after reboot, stonith agent could return success much
How could an alive man be sure he died before? ;)
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users