[ClusterLabs] Antw: Re: Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

Tue Dec 5 09:04:29 EST 2017

On 12/05/2017 12:41 PM, Ulrich Windl wrote:
> 
> 
>>>> "Gao,Yan" <ygao at suse.com> schrieb am 01.12.2017 um 20:36 in Nachricht
> <e49f3c0a-6981-3ab4-a0b0-1e5f49f34a25 at suse.com>:
>> On 11/30/2017 06:48 PM, Andrei Borzenkov wrote:
>>> 30.11.2017 16:11, Klaus Wenninger пишет:
>>>> On 11/30/2017 01:41 PM, Ulrich Windl wrote:
>>>>>
>>>>>>>> "Gao,Yan" <ygao at suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht
>>>>> <e71afccc-06e3-97dd-c66a-1b4bac550c23 at suse.com>:
>>>>>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>>>>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>>>>>>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>>>>>>> corosync and forcing STONITH pacemaker was not started after reboot.
>>>>>>> In logs I see during boot
>>>>>>>
>>>>>>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>>>>>>> just fenced by sapprod01p for sapprod01p
>>>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>>>>>>> process (3151) can no longer be respawned,
>>>>>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>>>>>> Pacemaker
>>>>>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>>>>>>> stonith with SBD always takes msgwait (at least, visually host is not
>>>>>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>>>>>>> and is up and running long before timeout expires.
>>>>> As msgwait was intended for the message to arrive, and not for the reboot
> 
>> time (I guess), this just shows a fundamental problem in SBD design: Receipt
> 
>> of the fencing command is not confirmed (other than by seeing the
>> consequences of ist execution).
>>>>
>>>> The 2 x msgwait is not for confirmations but for writing the poison-pill
>>>> and for
>>>> having it read by the target-side.
>>>
>>> Yes, of course, but that's not what Urlich likely intended to say.
>>> msgwait must account for worst case storage path latency, while in
>>> normal cases it happens much faster. If fenced node could acknowledge
>>> having been killed after reboot, stonith agent could return success much
>>> earlier.
>> How could an alive man be sure he died before? ;)
> 
> I meant: There are three delays:
> 1) The delay until data is on the disk
It takes several IOs for the sender to do this -- read the device 
header, lookup the slot, write the message and verify the message is 
written (-- A timeout_io defaults to 3s).

As mentioned, msgwait timer of the sender starts only after message has 
been verified to be written. We just need to make sure stonith-timeout 
is configured longer enough than the sum.

> 2) Delay until date is read from the disk
It's already taken into account with msgwait. Considering the recipient 
keeps reading in a loop, we don't know when exactly it starts to read 
for this specific message. But once it starts a reading, it has to be 
done within timeout_watchdog, otherwise watchdog triggers. So even for a 
bad case, the message should be read within 2* timemout_watchdog. That's 
the reason why the sender has to wait msgwait, which is 2 * 
timeout_watchdog.

> 3) Delay until Host was killed
Kill is basically immediately triggered once poison pill is read.

> A confirmation before 3) could shorten the total wait that includes 2) and 3),
> right?
As mentioned in another email, an alive node, even indeed coming back 
from death, cannot actually confirm itself or even give a confirmation 
about if it was ever dead. And a successful fencing means the node being 
dead.

Regards,
   Yan

> 
> Regards,
> Ulrich
> 
> 
>>
>> Regards,
>>     Yan
>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>