[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 18 02:34:47 EST 2020
>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:21 in
Nachricht <58579c3b-33ce-a121-5d67-00305f3d7090 at gmail.com>:
> 18.12.2020 10:09, Ulrich Windl пишет:
>>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 18.12.2020 um 08:01
in
>> Nachricht <d79895c5-5b58-a710-8a51-761479b20b85 at gmail.com>:
>>> 17.12.2020 21:30, Ken Gaillot пишет:
>>>>
>>>> This reminded me that some IPMI implementations return "success" for
>>>> commands before they've actually been completed. This is why
>>>> fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds.
>>>>
>>>
>>> But on this case we also do not know whether command has been completed
>>> successfully or not. I'd say in this case the only safe way is to use
>>> poweroff and verify in stonith agent that node is actually powered off
>>> before returning success.
>>
>> As I wrote in my message, the other node showind that a node has left would
> be
>> an indication that fencing was successful
>
> You got it backwards. The fencing starts when pacemaker gets indication
> that other node has left.
>
>> IF there was a valid network
>> connection up to the fencing event. Thus I think a redundant network is
> rather
>> important. The user should be able to tell whether fencing actually does
> work;
>> maybe not from syslog, but from other indicators.
>
> Completely wrong. Fencing is needed exactly when there is no possibility
> to get information about the other node and there is no way to verify
> other node state using "normal" means.
Alexamder: I was not talking about "when fencing is needed", but about "what
may indicate that fencing happened"
>
> Redundant network helps to avoid unnecessary fencing, it is not
> replacement for fencing.
>
>> Also if the network outage were simulated by using a node-specific
blackhole
>> route (blocking just the other node(s)), the node could be queried (for
>> example) by a ping from a third note to see whether and when it actually
> wend
>> down.
>>
>
> And? How should two isolated pacemaker instance now communicate and
> coordinate activity even if there is connectivity via some of oter
> networks available on nodes? Using multiple rings utilizing all
> available networks falls under "redundant network".
I'm afraid you misunderstood. See above.
Regards,
Ulrich
>
>> Regards,
>> Ulrich
>>
>>>
>>>> The best thing would be to do some manual testing using ipmitool or
>>>> whatnot to turn off the power, and observe how long it takes between
>>>> when the command returns and the server actually is powered down. Then
>>>> set power_wait to a comfortable margin above that. Or just keep raising
>>>> power_wait until the problem goes away :)
>>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list