[ClusterLabs] Any CLVM/DLM users around?

Mon Oct 1 13:00:10 EDT 2018

Probably you need to enable_startup_fencing = 0 instead of enable_fencing = 0.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: support at feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby přizpůsobíme Vám Máte specifické požadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V Chotejně 765/15
Praha 10 – Hostivař, PSČ 102 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010 0000 0024 0033 0446

> On 1 Oct 2018, at 18:55, Patrick Whitney <pwhitney at luminoso.com> wrote:
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't.
> 
> As a matter of fact, DLM has a setting "enable_fencing=0|1" for what that's worth.   
>  
> You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> We do have fencing enabled in the cluster; we've tested both node level fencing and resource fencing; DLM behaved identically in both scenarios, until we set it to 'enable_fencing=0' in the dlm.conf file. 
>  
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> This isn't quite what I was seeing in the logs.  The "failed" node would be fenced off, pacemaker appeared to be sane, reporting services running on the running nodes, but once the failed node was seen as missing by dlm (dlm_controld), dlm would request fencing, from what I can tell by the log entry.  Here is an example of the suspect log entry:
> Sep 26 09:41:35 pcmk-test-1 dlm_controld[837]: 38 fence request 2 pid 1446 startup time 1537969264 fence_all dlm_stonith
>  
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> Can you speak more to what "proper fencing" is for DLM? 
> 
> Best,
> -Pat
> 
>   
> 
> On Mon, Oct 1, 2018 at 12:30 PM Digimer <lists at alteeve.ca <mailto:lists at alteeve.ca>> wrote:
> On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
> > Patrick Whitney <pwhitney at luminoso.com <mailto:pwhitney at luminoso.com>> writes:
> > 
> >> I have a two node (test) cluster running corosync/pacemaker with DLM
> >> and CLVM.
> >>
> >> I was running into an issue where when one node failed, the remaining node
> >> would appear to do the right thing, from the pcmk perspective, that is.
> >> It would  create a new cluster (of one) and fence the other node, but
> >> then, rather surprisingly, DLM would see the other node offline, and it
> >> would go offline itself, abandoning the lockspace.
> >>
> >> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> >> our tests are now working as expected.
> > 
> > I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
> > is, they are started by systemd, not by Pacemaker).  I've seen weird DLM
> > fencing behavior, but not what you describe above (though I ran with
> > more than two nodes from the very start).  Actually, I don't even
> > understand how it occured to you to disable DLM fencing to fix that...
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't. You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> 
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> >> I'm a little concern I have masked an issue by doing this, as in all
> >> of the tutorials and docs I've read, there is no mention of having to
> >> configure DLM whatsoever.
> > 
> > Unfortunately it's very hard to come by any reliable info about DLM.  I
> > had a couple of enlightening exchanges with David Teigland (its primary
> > author) on this list, he is very helpful indeed, but I'm still very far
> > from having a working understanding of it.
> > 
> > But I've been running with --enable_fencing=0 for years without issues,
> > leaving all fencing to Pacemaker.  Note that manual cLVM operations are
> > the only users of DLM here, so delayed fencing does not cause any
> > problems, the cluster services do not depend on DLM being operational (I
> > mean it can stay frozen for several days -- as it happened in a couple
> > of pathological cases).  GFS2 would be a very different thing, I guess.
> > 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/ <https://alteeve.com/w/>
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
> 
> 
> -- 
> Patrick Whitney
> DevOps Engineer -- Tools
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181001/21935865/attachment-0002.html>