[ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

Fri Mar 5 12:26:35 EST 2021

On 3/5/21 6:04 PM, Digimer wrote:
> On 2021-03-05 2:14 a.m., Ulrich Windl wrote:
>>>> How would the fencing be confirmed? I don't know.
>>> It's part of the FenceAgentAPI. The cluster invokes the fence agent,
>>> passes in variable=value pairs on STDIN, and waits for the agent to
>>> exit. It reads the agent's exit code and uses that to determine success
>>> or failure.
>> But the agent "acting remote" cannot be sure the "remote end" was killed,
>> specifically when the network connection seems dead.
>> I see that in the IPMI case you have a separate connection allowing
>> "out-of-band signaling", but in the general case that would not be possible.
> To elaborate on Klaus's reply;
>
> The cluster has no control over how the fence agent works, it can only
> dictate the API and expect the fence agent is implemented in a sane way.
> If your agent returns success, but the node wasn't confirmed off
> properly in the agent, you will get a split-brain and that will be no
> fault of the cluster itself.
>
> Speaking to the "remote end" part;
>
> All good fence agents need to work regardless of the state of the target
> node. If, somehow, a fence agent needs the target to be in some sort of
> defines state, it is a critically flawed fence agent. A classic example
> of this is the often-requested "ssh fence agent" (and it's why such an
> agent doesn't exist).
>
> So your fence agent must be able to work out of band, by definition and
> design. When you call an IPMI BMC, you are effectively talking to a
> different mini computer on the target. Even then, if the mainboard
> utterly dies and takes the BMC with it, it will fail to fence as well.
> This is why at Alteeve we always have a backup fence method, switched
> PDUs on different switches from the IPMI BMC connections.
>
> Fencing really is critical, and as such, it should be certain to work,
> and ideally, have a backup fence method. So if you find that your
> fence-azure agent isn't reliable, and you can use SBD as Klaus
> mentioned, you can configure fence-sbd as a backup method to fence-azure.
>
Nothing to add - to the point as usually - but that the statement from
Ulrich lookedgeneral - not necessarily azure specific - and thus my
comment was as well.
Just wanted to state that I didn't advertise SBD as fencing method
forazure. SBD needs a reliable watchdog and afaik softdog is the only
watchdogyou have on azure (maybe different for certain BareMetal
offerings).
If you consider that reliable enough you have to negotiate with your own
conscience or the provider of your distribution ;-)

Klaus