[ClusterLabs] Any CLVM/DLM users around?
lists at alteeve.ca
Mon Oct 1 12:30:09 EDT 2018
On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
> Patrick Whitney <pwhitney at luminoso.com> writes:
>> I have a two node (test) cluster running corosync/pacemaker with DLM
>> and CLVM.
>> I was running into an issue where when one node failed, the remaining node
>> would appear to do the right thing, from the pcmk perspective, that is.
>> It would create a new cluster (of one) and fence the other node, but
>> then, rather surprisingly, DLM would see the other node offline, and it
>> would go offline itself, abandoning the lockspace.
>> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
>> our tests are now working as expected.
> I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
> is, they are started by systemd, not by Pacemaker). I've seen weird DLM
> fencing behavior, but not what you describe above (though I ran with
> more than two nodes from the very start). Actually, I don't even
> understand how it occured to you to disable DLM fencing to fix that...
Fencing in clustering is always required, but unlike pacemaker that lets
you turn it off and take your chances, DLM doesn't. You must have
working fencing for DLM (and anything using it) to function correctly.
Basically, cluster config changes (node declared lost), dlm informed and
blocks, fence attempt begins and loops until it succeeds, on success,
informs DLM, dlm reaps locks held by the lost node and normal operation
This isn't a question of node count or other configuration concerns.
It's simply that you must have proper fencing for DLM.
>> I'm a little concern I have masked an issue by doing this, as in all
>> of the tutorials and docs I've read, there is no mention of having to
>> configure DLM whatsoever.
> Unfortunately it's very hard to come by any reliable info about DLM. I
> had a couple of enlightening exchanges with David Teigland (its primary
> author) on this list, he is very helpful indeed, but I'm still very far
> from having a working understanding of it.
> But I've been running with --enable_fencing=0 for years without issues,
> leaving all fencing to Pacemaker. Note that manual cLVM operations are
> the only users of DLM here, so delayed fencing does not cause any
> problems, the cluster services do not depend on DLM being operational (I
> mean it can stay frozen for several days -- as it happened in a couple
> of pathological cases). GFS2 would be a very different thing, I guess.
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users