[ClusterLabs] DLM fencing

Digimer lists at alteeve.ca
Mon Feb 8 19:03:38 UTC 2016


On 08/02/16 01:56 PM, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
> 
>> On 02/07/2016 12:21 AM, G Spot wrote:
>>
>>> Thanks for your response, am using ocf:pacemaker:controld resource
>>> agent and stonith-enabled=false do I need to configure stonith device
>>> to make this work?
>>
>> Correct. DLM requires access to fencing.
> 
> I've ment to explore this connection for long, but never found much
> useful material on the subject.  How does DLM fencing fit into the
> modern Pacemaker architecture?  Fencing is a confusing topic in itself
> already (fence_legacy, fence_pcmk, stonith, stonithd, stonith_admin),
> then dlm_controld can use dlm_stonith to proxy fencing requests to
> Pacemaker, and it becomes hopeless... :)
> 
> I'd be grateful for a pointer to a good overview document, or a quick
> sketch if you can spare the time.  To invoke some concrete questions:
> When does DLM fence a node?  Is it necessary only when there's no
> resource manager running on the cluster?  Does it matter whether
> dlm_controld is run as a standalone daemon or as a controld resource?
> Wouldn't Pacemaker fence a failing node itself all the same?  Or is
> dlm_stonith for the case when only the stonithd component of Pacemaker
> is active somehow?

DLM is a thing onto itself, and some tools like gfs2 and clustered-lvm
use it to coordinate locking across the cluster. If a node drops out,
the cluster informs dlm and it blocks until the lost node is confirmed
fenced. Then it reaps the lost locks and recovery can begin.

If fencing fails or is not configured, DLM never unblocks and anything
using it is left hung (by design, better to hang than risk corruption).

One of many reasons why fencing is critical.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list