[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 18 02:34:47 EST 2020


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:21 in
Nachricht <58579c3b-33ce-a121-5d67-00305f3d7090 at gmail.com>:
> 18.12.2020 10:09, Ulrich Windl пишет:
>>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:01
in
>> Nachricht <d79895c5-5b58-a710-8a51-761479b20b85 at gmail.com>:
>>> 17.12.2020 21:30, Ken Gaillot пишет:
>>>>
>>>> This reminded me that some IPMI implementations return "success" for
>>>> commands before they've actually been completed. This is why
>>>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.
>>>>
>>>
>>> But on this case we also do not know whether command has been completed
>>> successfully or not. I'd say in this case the only safe way is to use
>>> poweroff and verify in stonith agent that node is actually powered off
>>> before returning success.
>> 
>> As I wrote in my message, the other node showind that a node has left would

> be
>> an indication that fencing was successful
> 
> You got it backwards. The fencing starts when pacemaker gets indication
> that other node has left.
> 
>> IF there was a valid network
>> connection up to the fencing event. Thus I think a redundant network is 
> rather
>> important. The user should be able to tell whether fencing actually does 
> work;
>> maybe not from syslog, but from other indicators.
> 
> Completely wrong. Fencing is needed exactly when there is no possibility
> to get information about the other node and there is no way to verify
> other node state using "normal" means.

Alexamder: I was not talking about "when fencing is needed", but about "what
may indicate that fencing happened"

> 
> Redundant network helps to avoid unnecessary fencing, it is not
> replacement for fencing.
> 
>> Also if the network outage were simulated by using a node-specific
blackhole
>> route (blocking just the other node(s)), the node could be queried (for
>> example) by a ping from a third note to see whether and when it actually 
> wend
>> down.
>> 
> 
> And? How should two isolated pacemaker instance now communicate and
> coordinate activity even if there is connectivity via some of oter
> networks available on nodes? Using multiple rings utilizing all
> available networks falls under "redundant network".

I'm afraid you misunderstood. See above.

Regards,
Ulrich

> 
>> Regards,
>> Ulrich
>> 
>>>
>>>> The best thing would be to do some manual testing using ipmitool or
>>>> whatnot to turn off the power, and observe how long it takes between
>>>> when the command returns and the server actually is powered down. Then
>>>> set power_wait to a comfortable margin above that. Or just keep raising
>>>> power_wait until the problem goes away :)
>>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list