[ClusterLabs] Any CLVM/DLM users around?

Ferenc Wágner wagner.ferenc at kifu.gov.hu
Mon Oct 1 16:04:29 UTC 2018


Patrick Whitney <pwhitney at luminoso.com> writes:

> I have a two node (test) cluster running corosync/pacemaker with DLM
> and CLVM.
>
> I was running into an issue where when one node failed, the remaining node
> would appear to do the right thing, from the pcmk perspective, that is.
> It would  create a new cluster (of one) and fence the other node, but
> then, rather surprisingly, DLM would see the other node offline, and it
> would go offline itself, abandoning the lockspace.
>
> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> our tests are now working as expected.

I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
is, they are started by systemd, not by Pacemaker).  I've seen weird DLM
fencing behavior, but not what you describe above (though I ran with
more than two nodes from the very start).  Actually, I don't even
understand how it occured to you to disable DLM fencing to fix that...

> I'm a little concern I have masked an issue by doing this, as in all
> of the tutorials and docs I've read, there is no mention of having to
> configure DLM whatsoever.

Unfortunately it's very hard to come by any reliable info about DLM.  I
had a couple of enlightening exchanges with David Teigland (its primary
author) on this list, he is very helpful indeed, but I'm still very far
from having a working understanding of it.

But I've been running with --enable_fencing=0 for years without issues,
leaving all fencing to Pacemaker.  Note that manual cLVM operations are
the only users of DLM here, so delayed fencing does not cause any
problems, the cluster services do not depend on DLM being operational (I
mean it can stay frozen for several days -- as it happened in a couple
of pathological cases).  GFS2 would be a very different thing, I guess.
-- 
Regards,
Feri


More information about the Users mailing list