[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Dec 5 14:11:46 UTC 2017



>>> "Gao,Yan" <ygao at suse.com> schrieb am 05.12.2017 um 15:04 in Nachricht
<f3433dca-d654-0eac-80d6-2f92aeb3e894 at suse.com>:
> On 12/05/2017 12:41 PM, Ulrich Windl wrote:
>> 
>> 
>>>>> "Gao,Yan" <ygao at suse.com> schrieb am 01.12.2017 um 20:36 in Nachricht
>> <e49f3c0a-6981-3ab4-a0b0-1e5f49f34a25 at suse.com>:

[...]
>> 
>> I meant: There are three delays:
>> 1) The delay until data is on the disk
> It takes several IOs for the sender to do this -- read the device 
> header, lookup the slot, write the message and verify the message is 
> written (-- A timeout_io defaults to 3s).
> 
> As mentioned, msgwait timer of the sender starts only after message has 
> been verified to be written. We just need to make sure stonith-timeout 
> is configured longer enough than the sum.
> 
>> 2) Delay until date is read from the disk
> It's already taken into account with msgwait. Considering the recipient 
> keeps reading in a loop, we don't know when exactly it starts to read 
> for this specific message. But once it starts a reading, it has to be 
> done within timeout_watchdog, otherwise watchdog triggers. So even for a 
> bad case, the message should be read within 2* timemout_watchdog. That's 
> the reason why the sender has to wait msgwait, which is 2 * 
> timeout_watchdog.
> 
>> 3) Delay until Host was killed
> Kill is basically immediately triggered once poison pill is read.

Considering that the response time of a SAN disk system with cache is typically a very few microseconds, writing to disk may be even "more immediate" than killing the node via watchdog reset ;-)
So you can't easily say one is immediate, while the other has to be waited for IMHO.

Regards,
Ulrich

> 
>> A confirmation before 3) could shorten the total wait that includes 2) and 
> 3),
>> right?
> As mentioned in another email, an alive node, even indeed coming back 
> from death, cannot actually confirm itself or even give a confirmation 
> about if it was ever dead. And a successful fencing means the node being 
> dead.
> 
> Regards,
>    Yan
> 
> 
>> 
>> Regards,
>> Ulrich
>> 
>> 
>>>
>>> Regards,
>>>     Yan
>>>
[...]





More information about the Users mailing list