[ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway
Digimer
lists at alteeve.ca
Fri Mar 5 12:04:46 EST 2021
On 2021-03-05 2:14 a.m., Ulrich Windl wrote:
>>> How would the fencing be confirmed? I don't know.
>>
>> It's part of the FenceAgentAPI. The cluster invokes the fence agent,
>> passes in variable=value pairs on STDIN, and waits for the agent to
>> exit. It reads the agent's exit code and uses that to determine success
>> or failure.
>
> But the agent "acting remote" cannot be sure the "remote end" was killed,
> specifically when the network connection seems dead.
> I see that in the IPMI case you have a separate connection allowing
> "out-of-band signaling", but in the general case that would not be possible.
To elaborate on Klaus's reply;
The cluster has no control over how the fence agent works, it can only
dictate the API and expect the fence agent is implemented in a sane way.
If your agent returns success, but the node wasn't confirmed off
properly in the agent, you will get a split-brain and that will be no
fault of the cluster itself.
Speaking to the "remote end" part;
All good fence agents need to work regardless of the state of the target
node. If, somehow, a fence agent needs the target to be in some sort of
defines state, it is a critically flawed fence agent. A classic example
of this is the often-requested "ssh fence agent" (and it's why such an
agent doesn't exist).
So your fence agent must be able to work out of band, by definition and
design. When you call an IPMI BMC, you are effectively talking to a
different mini computer on the target. Even then, if the mainboard
utterly dies and takes the BMC with it, it will fail to fence as well.
This is why at Alteeve we always have a backup fence method, switched
PDUs on different switches from the IPMI BMC connections.
Fencing really is critical, and as such, it should be certain to work,
and ideally, have a backup fence method. So if you find that your
fence-azure agent isn't reliable, and you can use SBD as Klaus
mentioned, you can configure fence-sbd as a backup method to fence-azure.
--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users
mailing list