[ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

Fri Mar 5 12:33:38 EST 2021

On 2021-03-05 12:26 p.m., Klaus Wenninger wrote:
> On 3/5/21 6:04 PM, Digimer wrote:
>> On 2021-03-05 2:14 a.m., Ulrich Windl wrote:
>>>>> How would the fencing be confirmed? I don't know.
>>>> It's part of the FenceAgentAPI. The cluster invokes the fence agent,
>>>> passes in variable=value pairs on STDIN, and waits for the agent to
>>>> exit. It reads the agent's exit code and uses that to determine success
>>>> or failure.
>>> But the agent "acting remote" cannot be sure the "remote end" was killed,
>>> specifically when the network connection seems dead.
>>> I see that in the IPMI case you have a separate connection allowing
>>> "out-of-band signaling", but in the general case that would not be possible.
>> To elaborate on Klaus's reply;
>>
>> The cluster has no control over how the fence agent works, it can only
>> dictate the API and expect the fence agent is implemented in a sane way.
>> If your agent returns success, but the node wasn't confirmed off
>> properly in the agent, you will get a split-brain and that will be no
>> fault of the cluster itself.
>>
>> Speaking to the "remote end" part;
>>
>> All good fence agents need to work regardless of the state of the target
>> node. If, somehow, a fence agent needs the target to be in some sort of
>> defines state, it is a critically flawed fence agent. A classic example
>> of this is the often-requested "ssh fence agent" (and it's why such an
>> agent doesn't exist).
>>
>> So your fence agent must be able to work out of band, by definition and
>> design. When you call an IPMI BMC, you are effectively talking to a
>> different mini computer on the target. Even then, if the mainboard
>> utterly dies and takes the BMC with it, it will fail to fence as well.
>> This is why at Alteeve we always have a backup fence method, switched
>> PDUs on different switches from the IPMI BMC connections.
>>
>> Fencing really is critical, and as such, it should be certain to work,
>> and ideally, have a backup fence method. So if you find that your
>> fence-azure agent isn't reliable, and you can use SBD as Klaus
>> mentioned, you can configure fence-sbd as a backup method to fence-azure.
>>
> Nothing to add - to the point as usually - but that the statement from
> Ulrich lookedgeneral - not necessarily azure specific - and thus my
> comment was as well.
> Just wanted to state that I didn't advertise SBD as fencing method
> forazure. SBD needs a reliable watchdog and afaik softdog is the only
> watchdogyou have on azure (maybe different for certain BareMetal
> offerings).
> If you consider that reliable enough you have to negotiate with your own
> conscience or the provider of your distribution ;-)
> 
> Klaus

I know nothing of Azure. If you don't have hardware watchdog, fence-sbd
is not reliable.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould