[ClusterLabs] Any CLVM/DLM users around?

Digimer lists at alteeve.ca
Mon Oct 1 13:05:33 EDT 2018


On 2018-10-01 12:55 PM, Patrick Whitney wrote:
>     Fencing in clustering is always required, but unlike pacemaker that lets
>     you turn it off and take your chances, DLM doesn't.
> 
> 
> As a matter of fact, DLM has a setting "enable_fencing=0|1" for what
> that's worth.   

I did not know that... Interesting. Dangerous, but interesting.

>     You must have
>     working fencing for DLM (and anything using it) to function correctly.
> 
> 
> We do have fencing enabled in the cluster; we've tested both node level
> fencing and resource fencing; DLM behaved identically in both scenarios,
> until we set it to 'enable_fencing=0' in the dlm.conf file. 
>  
> 
>     Basically, cluster config changes (node declared lost), dlm informed and
>     blocks, fence attempt begins and loops until it succeeds, on success,
>     informs DLM, dlm reaps locks held by the lost node and normal operation
>     continues.
> 
> This isn't quite what I was seeing in the logs.  The "failed" node would
> be fenced off, pacemaker appeared to be sane, reporting services running
> on the running nodes, but once the failed node was seen as missing by
> dlm (dlm_controld), dlm would request fencing, from what I can tell by
> the log entry.  Here is an example of the suspect log entry:
> Sep 26 09:41:35 pcmk-test-1 dlm_controld[837]: 38 fence request 2 pid
> 1446 startup time 1537969264 fence_all dlm_stonith
>  
> 
>     This isn't a question of node count or other configuration concerns.
>     It's simply that you must have proper fencing for DLM.
> 
> 
> Can you speak more to what "proper fencing" is for DLM? 
> 
> Best,
> -Pat
> 
>   
> 
> On Mon, Oct 1, 2018 at 12:30 PM Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
>     > Patrick Whitney <pwhitney at luminoso.com
>     <mailto:pwhitney at luminoso.com>> writes:
>     >
>     >> I have a two node (test) cluster running corosync/pacemaker with DLM
>     >> and CLVM.
>     >>
>     >> I was running into an issue where when one node failed, the
>     remaining node
>     >> would appear to do the right thing, from the pcmk perspective,
>     that is.
>     >> It would  create a new cluster (of one) and fence the other node, but
>     >> then, rather surprisingly, DLM would see the other node offline,
>     and it
>     >> would go offline itself, abandoning the lockspace.
>     >>
>     >> I changed my DLM settings to "enable_fencing=0", disabling DLM
>     fencing, and
>     >> our tests are now working as expected.
>     >
>     > I'm running a larger Pacemaker cluster with standalone DLM + cLVM
>     (that
>     > is, they are started by systemd, not by Pacemaker).  I've seen
>     weird DLM
>     > fencing behavior, but not what you describe above (though I ran with
>     > more than two nodes from the very start).  Actually, I don't even
>     > understand how it occured to you to disable DLM fencing to fix that...
> 
>     Fencing in clustering is always required, but unlike pacemaker that lets
>     you turn it off and take your chances, DLM doesn't. You must have
>     working fencing for DLM (and anything using it) to function correctly.
> 
>     Basically, cluster config changes (node declared lost), dlm informed and
>     blocks, fence attempt begins and loops until it succeeds, on success,
>     informs DLM, dlm reaps locks held by the lost node and normal operation
>     continues.
> 
>     This isn't a question of node count or other configuration concerns.
>     It's simply that you must have proper fencing for DLM.
> 
>     >> I'm a little concern I have masked an issue by doing this, as in all
>     >> of the tutorials and docs I've read, there is no mention of having to
>     >> configure DLM whatsoever.
>     >
>     > Unfortunately it's very hard to come by any reliable info about
>     DLM.  I
>     > had a couple of enlightening exchanges with David Teigland (its
>     primary
>     > author) on this list, he is very helpful indeed, but I'm still
>     very far
>     > from having a working understanding of it.
>     >
>     > But I've been running with --enable_fencing=0 for years without
>     issues,
>     > leaving all fencing to Pacemaker.  Note that manual cLVM
>     operations are
>     > the only users of DLM here, so delayed fencing does not cause any
>     > problems, the cluster services do not depend on DLM being
>     operational (I
>     > mean it can stay frozen for several days -- as it happened in a couple
>     > of pathological cases).  GFS2 would be a very different thing, I
>     guess.
>     >
> 
> 
>     -- 
>     Digimer
>     Papers and Projects: https://alteeve.com/w/
>     "I am, somehow, less interested in the weight and convolutions of
>     Einstein’s brain than in the near certainty that people of equal talent
>     have lived and died in cotton fields and sweatshops." - Stephen Jay
>     Gould
> 
> 
> 
> -- 
> Patrick Whitney
> DevOps Engineer -- Tools


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould



More information about the Users mailing list