<div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Ulrich, <div><br></div><div>When I first encountered this issue, I posted this:</div><div><br></div><div><a href="https://lists.clusterlabs.org/pipermail/users/2018-September/015637.html">https://lists.clusterlabs.org/pipermail/users/2018-September/015637.html</a><br></div><div><br></div><div>... I was using resource fencing in this example, but, as I've mentioned before, the issue would come about, not when fencing occurred, but when the fenced node was shutdown (we were using resource fencing). </div><div><br></div><div>During that discussion, yourself and others suggested that power fencing was the only way DLM was going to cooperate and one suggestion of using meatware was proposed. </div><div><br></div><div>Unfortunately, I found out later that meatware was no longer available (<a href="https://lists.clusterlabs.org/pipermail/users/2018-September/015715.html">https://lists.clusterlabs.org/pipermail/users/2018-September/015715.html</a>), so we were lucky enough our test environment is a KVM/libvirt environment, so I used fence_virsh. Again, I had the same problem... when the "bad" node was fenced, dlm_controld would issue (what appears to be) a fence_all, and I would receive messages that that the dlm clone was down on all members and would have a log message that the clvm lockspace was abandoned. </div><div><br></div><div>It was only when I disabled fencing for dlm (enable_fencing=0 in dlm.conf; but kept fencing enabled in pcmk) did things begin to work as expected. </div><div><br></div><div>One suggestion earlier in this thread suggests trying the dlm configuration of disabling startup fencing (enable_startup_fencing=0), which sounds like a plausible solution after looking over the logs, but I haven't tested yet. </div><div><br></div><div>The conclusion I'm coming to is:</div><div>1. The reason DLM cannot handle resource fencing is because it keeps its own "heartbeat/control" channel (for lack of a better term) via the network, and pcmk cannot instruct DLM "Don't worry about that guy over there" which means we must use power fencing, but;</div><div>2. DLM does not like to see one of its members disappear; when that does happen, DLM does "something" which causes the lockspace to disappear... unless you disable fencing for DLM. </div><div><br></div><div>I am now speculating that DLM restarts when the communications fail, and the theory that disabling startup fencing for DLM (enable_startup_fencing=0) may be the solution to my problem (reverting my enable_fencing=0 DLM config). </div><div><br></div><div>Best,</div><div>-Pat</div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Oct 1, 2018 at 3:38 PM Ulrich Windl <<a href="mailto:Ulrich.Windl@rz.uni-regensburg.de">Ulrich.Windl@rz.uni-regensburg.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi!<br>
<br>
It would be much more helpful, if you could provide logs around the problem events. Personally I think you _must_ implement proper fencing. In addition, DLM seems to do its own fencing when there is a communication problem.<br>
<br>
Regards,<br>
Ulrich<br>
<br>
<br>
>>> Patrick Whitney <<a href="mailto:pwhitney@luminoso.com" target="_blank">pwhitney@luminoso.com</a>> 01.10.18 16.25 Uhr >>><br>
Hi Everyone,<br>
<br>
I wanted to solicit input on my configuration.<br>
<br>
I have a two node (test) cluster running corosync/pacemaker with DLM and<br>
CLVM.<br>
<br>
I was running into an issue where when one node failed, the remaining node<br>
would appear to do the right thing, from the pcmk perspective, that is.<br>
It would create a new cluster (of one) and fence the other node, but<br>
then, rather surprisingly, DLM would see the other node offline, and it<br>
would go offline itself, abandoning the lockspace.<br>
<br>
I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and<br>
our tests are now working as expected.<br>
<br>
I'm a little concern I have masked an issue by doing this, as in all of the<br>
tutorials and docs I've read, there is no mention of having to configure<br>
DLM whatsoever.<br>
<br>
Is anyone else running a similar stack and can comment?<br>
<br>
Best,<br>
-Pat<br>
-- <br>
Patrick Whitney<br>
DevOps Engineer -- Tools<br>
<br>
_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Patrick Whitney<div>DevOps Engineer -- Tools</div></div></div>