[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

Andrei Borzenkov arvidjaar at gmail.com
Fri Dec 18 02:21:24 EST 2020


18.12.2020 10:09, Ulrich Windl пишет:
>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:01 in
> Nachricht <d79895c5-5b58-a710-8a51-761479b20b85 at gmail.com>:
>> 17.12.2020 21:30, Ken Gaillot пишет:
>>>
>>> This reminded me that some IPMI implementations return "success" for
>>> commands before they've actually been completed. This is why
>>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.
>>>
>>
>> But on this case we also do not know whether command has been completed
>> successfully or not. I'd say in this case the only safe way is to use
>> poweroff and verify in stonith agent that node is actually powered off
>> before returning success.
> 
> As I wrote in my message, the other node showind that a node has left would be
> an indication that fencing was successful

You got it backwards. The fencing starts when pacemaker gets indication
that other node has left.

> IF there was a valid network
> connection up to the fencing event. Thus I think a redundant network is rather
> important. The user should be able to tell whether fencing actually does work;
> maybe not from syslog, but from other indicators.

Completely wrong. Fencing is needed exactly when there is no possibility
to get information about the other node and there is no way to verify
other node state using "normal" means.

Redundant network helps to avoid unnecessary fencing, it is not
replacement for fencing.

> Also if the network outage were simulated by using a node-specific blackhole
> route (blocking just the other node(s)), the node could be queried (for
> example) by a ping from a third note to see whether and when it actually wend
> down.
> 

And? How should two isolated pacemaker instance now communicate and
coordinate activity even if there is connectivity via some of oter
networks available on nodes? Using multiple rings utilizing all
available networks falls under "redundant network".

> Regards,
> Ulrich
> 
>>
>>> The best thing would be to do some manual testing using ipmitool or
>>> whatnot to turn off the power, and observe how long it takes between
>>> when the command returns and the server actually is powered down. Then
>>> set power_wait to a comfortable margin above that. Or just keep raising
>>> power_wait until the problem goes away :)
>>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list