[ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

Gao,Yan ygao at suse.com
Fri Dec 1 14:36:12 EST 2017


On 11/30/2017 06:48 PM, Andrei Borzenkov wrote:
> 30.11.2017 16:11, Klaus Wenninger пишет:
>> On 11/30/2017 01:41 PM, Ulrich Windl wrote:
>>>
>>>>>> "Gao,Yan" <ygao at suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht
>>> <e71afccc-06e3-97dd-c66a-1b4bac550c23 at suse.com>:
>>>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>>>>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>>>>> corosync and forcing STONITH pacemaker was not started after reboot.
>>>>> In logs I see during boot
>>>>>
>>>>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>>>>> just fenced by sapprod01p for sapprod01p
>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>>>>> process (3151) can no longer be respawned,
>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>>>> Pacemaker
>>>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>>>>> stonith with SBD always takes msgwait (at least, visually host is not
>>>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>>>>> and is up and running long before timeout expires.
>>> As msgwait was intended for the message to arrive, and not for the reboot time (I guess), this just shows a fundamental problem in SBD design: Receipt of the fencing command is not confirmed (other than by seeing the consequences of ist execution).
>>
>> The 2 x msgwait is not for confirmations but for writing the poison-pill
>> and for
>> having it read by the target-side.
> 
> Yes, of course, but that's not what Urlich likely intended to say.
> msgwait must account for worst case storage path latency, while in
> normal cases it happens much faster. If fenced node could acknowledge
> having been killed after reboot, stonith agent could return success much
> earlier.
How could an alive man be sure he died before? ;)

Regards,
   Yan

> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Users mailing list