[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: pacemaker with sbd fails to start if node reboots too fast.
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Dec 5 09:11:46 EST 2017
>>> "Gao,Yan" <ygao at suse.com> schrieb am 05.12.2017 um 15:04 in Nachricht
<f3433dca-d654-0eac-80d6-2f92aeb3e894 at suse.com>:
> On 12/05/2017 12:41 PM, Ulrich Windl wrote:
>>
>>
>>>>> "Gao,Yan" <ygao at suse.com> schrieb am 01.12.2017 um 20:36 in Nachricht
>> <e49f3c0a-6981-3ab4-a0b0-1e5f49f34a25 at suse.com>:
[...]
>>
>> I meant: There are three delays:
>> 1) The delay until data is on the disk
> It takes several IOs for the sender to do this -- read the device
> header, lookup the slot, write the message and verify the message is
> written (-- A timeout_io defaults to 3s).
>
> As mentioned, msgwait timer of the sender starts only after message has
> been verified to be written. We just need to make sure stonith-timeout
> is configured longer enough than the sum.
>
>> 2) Delay until date is read from the disk
> It's already taken into account with msgwait. Considering the recipient
> keeps reading in a loop, we don't know when exactly it starts to read
> for this specific message. But once it starts a reading, it has to be
> done within timeout_watchdog, otherwise watchdog triggers. So even for a
> bad case, the message should be read within 2* timemout_watchdog. That's
> the reason why the sender has to wait msgwait, which is 2 *
> timeout_watchdog.
>
>> 3) Delay until Host was killed
> Kill is basically immediately triggered once poison pill is read.
Considering that the response time of a SAN disk system with cache is typically a very few microseconds, writing to disk may be even "more immediate" than killing the node via watchdog reset ;-)
So you can't easily say one is immediate, while the other has to be waited for IMHO.
Regards,
Ulrich
>
>> A confirmation before 3) could shorten the total wait that includes 2) and
> 3),
>> right?
> As mentioned in another email, an alive node, even indeed coming back
> from death, cannot actually confirm itself or even give a confirmation
> about if it was ever dead. And a successful fencing means the node being
> dead.
>
> Regards,
> Yan
>
>
>>
>> Regards,
>> Ulrich
>>
>>
>>>
>>> Regards,
>>> Yan
>>>
[...]
More information about the Users
mailing list