[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 18 02:09:45 EST 2020


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:01 in
Nachricht <d79895c5-5b58-a710-8a51-761479b20b85 at gmail.com>:
> 17.12.2020 21:30, Ken Gaillot пишет:
>> 
>> This reminded me that some IPMI implementations return "success" for
>> commands before they've actually been completed. This is why
>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.
>> 
> 
> But on this case we also do not know whether command has been completed
> successfully or not. I'd say in this case the only safe way is to use
> poweroff and verify in stonith agent that node is actually powered off
> before returning success.

As I wrote in my message, the other node showind that a node has left would be
an indication that fencing was successful IF there was a valid network
connection up to the fencing event. Thus I think a redundant network is rather
important. The user should be able to tell whether fencing actually does work;
maybe not from syslog, but from other indicators.
Also if the network outage were simulated by using a node-specific blackhole
route (blocking just the other node(s)), the node could be queried (for
example) by a ping from a third note to see whether and when it actually wend
down.

Regards,
Ulrich

> 
>> The best thing would be to do some manual testing using ipmitool or
>> whatnot to turn off the power, and observe how long it takes between
>> when the command returns and the server actually is powered down. Then
>> set power_wait to a comfortable margin above that. Or just keep raising
>> power_wait until the problem goes away :)
>> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list