[ClusterLabs] Any CLVM/DLM users around?
Digimer
lists at alteeve.ca
Mon Oct 1 13:05:33 EDT 2018
On 2018-10-01 12:55 PM, Patrick Whitney wrote:
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't.
>
>
> As a matter of fact, DLM has a setting "enable_fencing=0|1" for what
> that's worth.
I did not know that... Interesting. Dangerous, but interesting.
> You must have
> working fencing for DLM (and anything using it) to function correctly.
>
>
> We do have fencing enabled in the cluster; we've tested both node level
> fencing and resource fencing; DLM behaved identically in both scenarios,
> until we set it to 'enable_fencing=0' in the dlm.conf file.
>
>
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
>
> This isn't quite what I was seeing in the logs. The "failed" node would
> be fenced off, pacemaker appeared to be sane, reporting services running
> on the running nodes, but once the failed node was seen as missing by
> dlm (dlm_controld), dlm would request fencing, from what I can tell by
> the log entry. Here is an example of the suspect log entry:
> Sep 26 09:41:35 pcmk-test-1 dlm_controld[837]: 38 fence request 2 pid
> 1446 startup time 1537969264 fence_all dlm_stonith
>
>
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
>
>
> Can you speak more to what "proper fencing" is for DLM?
>
> Best,
> -Pat
>
>
>
> On Mon, Oct 1, 2018 at 12:30 PM Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
> On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
> > Patrick Whitney <pwhitney at luminoso.com
> <mailto:pwhitney at luminoso.com>> writes:
> >
> >> I have a two node (test) cluster running corosync/pacemaker with DLM
> >> and CLVM.
> >>
> >> I was running into an issue where when one node failed, the
> remaining node
> >> would appear to do the right thing, from the pcmk perspective,
> that is.
> >> It would create a new cluster (of one) and fence the other node, but
> >> then, rather surprisingly, DLM would see the other node offline,
> and it
> >> would go offline itself, abandoning the lockspace.
> >>
> >> I changed my DLM settings to "enable_fencing=0", disabling DLM
> fencing, and
> >> our tests are now working as expected.
> >
> > I'm running a larger Pacemaker cluster with standalone DLM + cLVM
> (that
> > is, they are started by systemd, not by Pacemaker). I've seen
> weird DLM
> > fencing behavior, but not what you describe above (though I ran with
> > more than two nodes from the very start). Actually, I don't even
> > understand how it occured to you to disable DLM fencing to fix that...
>
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't. You must have
> working fencing for DLM (and anything using it) to function correctly.
>
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
>
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
>
> >> I'm a little concern I have masked an issue by doing this, as in all
> >> of the tutorials and docs I've read, there is no mention of having to
> >> configure DLM whatsoever.
> >
> > Unfortunately it's very hard to come by any reliable info about
> DLM. I
> > had a couple of enlightening exchanges with David Teigland (its
> primary
> > author) on this list, he is very helpful indeed, but I'm still
> very far
> > from having a working understanding of it.
> >
> > But I've been running with --enable_fencing=0 for years without
> issues,
> > leaving all fencing to Pacemaker. Note that manual cLVM
> operations are
> > the only users of DLM here, so delayed fencing does not cause any
> > problems, the cluster services do not depend on DLM being
> operational (I
> > mean it can stay frozen for several days -- as it happened in a couple
> > of pathological cases). GFS2 would be a very different thing, I
> guess.
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay
> Gould
>
>
>
> --
> Patrick Whitney
> DevOps Engineer -- Tools
--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users
mailing list