[ClusterLabs] Antw: Any CLVM/DLM users around?

Patrick Whitney pwhitney at luminoso.com
Mon Oct 1 16:01:36 EDT 2018


Hi Ulrich,

When I first encountered this issue, I posted this:

https://lists.clusterlabs.org/pipermail/users/2018-September/015637.html

... I was using resource fencing in this example, but, as I've mentioned
before, the issue would come about, not when fencing occurred, but when the
fenced node was shutdown (we were using resource fencing).

During that discussion, yourself and others suggested that power fencing
was the only way DLM was going to cooperate and one suggestion of using
meatware was proposed.

Unfortunately, I found out later that meatware was no longer available (
https://lists.clusterlabs.org/pipermail/users/2018-September/015715.html),
so we were lucky enough our test environment is a KVM/libvirt environment,
so I used fence_virsh.  Again, I had the same problem... when the "bad"
node was fenced, dlm_controld would issue (what appears to be) a fence_all,
and I would receive messages that that the dlm clone was down on all
members and would have a log message that the clvm lockspace was
abandoned.

It was only when I disabled fencing for dlm (enable_fencing=0 in dlm.conf;
but kept fencing enabled in pcmk) did things begin to work as expected.

One suggestion earlier in this thread suggests trying the dlm configuration
of  disabling startup fencing (enable_startup_fencing=0), which sounds like
a plausible solution after looking over the logs, but I haven't tested
yet.

The conclusion I'm coming to is:
1. The reason DLM cannot handle resource fencing is because it keeps its
own "heartbeat/control" channel (for lack of a better term) via the
network, and pcmk cannot instruct DLM "Don't worry about that guy over
there" which means we must use power fencing, but;
2. DLM does not like to see one of its members disappear; when that does
happen, DLM does "something" which causes the lockspace to disappear...
unless you disable fencing for DLM.

I am now speculating that DLM restarts when the communications fail, and
the theory that disabling startup fencing for DLM
(enable_startup_fencing=0) may be the solution to my problem (reverting my
enable_fencing=0 DLM config).

Best,
-Pat

On Mon, Oct 1, 2018 at 3:38 PM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> Hi!
>
> It would be much more helpful, if you could provide logs around the
> problem events. Personally I think you _must_ implement proper fencing. In
> addition, DLM seems to do its own fencing when there is a communication
> problem.
>
> Regards,
> Ulrich
>
>
> >>> Patrick Whitney <pwhitney at luminoso.com> 01.10.18 16.25 Uhr >>>
> Hi Everyone,
>
> I wanted to solicit input on my configuration.
>
> I have a two node (test) cluster running corosync/pacemaker with DLM and
> CLVM.
>
> I was running into an issue where when one node failed, the remaining node
> would appear to do the right thing, from the pcmk perspective, that is.
>  It would  create a new cluster (of one) and fence the other node, but
> then, rather surprisingly, DLM would see the other node offline, and it
> would go offline itself, abandoning the lockspace.
>
> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> our tests are now working as expected.
>
> I'm a little concern I have masked an issue by doing this, as in all of the
> tutorials and docs I've read, there is no mention of having to configure
> DLM whatsoever.
>
> Is anyone else running a similar stack and can comment?
>
> Best,
> -Pat
> --
> Patrick Whitney
> DevOps Engineer -- Tools
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


-- 
Patrick Whitney
DevOps Engineer -- Tools
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181001/e7e456fe/attachment-0002.html>


More information about the Users mailing list