[ClusterLabs] Antw: Re: How is fencing and unfencing suppose to work?
lists at alteeve.ca
Mon Oct 1 13:08:33 EDT 2018
On 2018-10-01 03:06 AM, Ulrich Windl wrote:
>>>> digimer <lists at alteeve.ca> schrieb am 28.09.2018 um 19:11 in Nachricht
> <968d00cd-fad5-8f17-edfd-7787a9964355 at alteeve.ca>:
>> On 2018-09-04 8:49 p.m., Ken Gaillot wrote:
>>> On Tue, 2018-08-21 at 10:23 -0500, Ryan Thomas wrote:
>>>> I’m seeing unexpected behavior when using “unfencing” – I don’t think
>>>> I’m understanding it correctly. I configured a resource that
>>>> “requires unfencing” and have a custom fencing agent which “provides
>>>> unfencing”. I perform a simple test where I setup the cluster and
>>>> then run “pcs stonith fence node2”, and I see that node2 is
>>>> successfully fenced by sending an “off” action to my fencing agent.
>>>> But, immediately after this, I see an “on” action sent to my fencing
>>>> agent. My fence agent doesn’t implement the “reboot” action, so
>>>> perhaps its trying to reboot by running an off action followed by a
>>>> on action. Prior to adding “provides unfencing” to the fencing
>>>> agent, I didn’t see the on action. It seems unsafe to say “node2 you
>>>> can’t run” and then immediately “ you can run”.
>>> I'm not as familiar with unfencing as I'd like, but I believe the basic
>>> idea is:
>>> - the fence agent's off action cuts the machine off from something
>>> essential needed to run resources (generally shared storage or network
>>> - the fencing works such that a fenced host is not able to request
>>> rejoining the cluster without manual intervention by a sysadmin
>>> - when the sysadmin allows the host back into the cluster, and it
>>> contacts the other nodes to rejoin, the cluster will call the fence
>>> agent's on action, which is expected to re-enable the host's access
>>> How that works in practice, I have only vague knowledge.
>> This is correct. Consider fabric fencing where fiber channel ports are
>> disconnected. Unfence restores the connection. Similar to a pure 'off'
>> fence call to switched PDUs, as you mention above. Unfence powers the
>> outlets back up.
> I doubt whether successful fencing can be emulated by "pausing" I/O: when
> re-establishing the fabric, outstanding I/Os might be performed (which cannot
> happen after real fencing).
I have never been a fan of fabric fencing, and that is exactly one
reason why. Another being panic'ed admins, not understanding what's
happening, and turning ports back up. To me, power fencing is the only
That said, the question was about unfence, and that is what I was
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users