[Pacemaker] Build dlm_controld for pacemaker stack (dlm_controld.pcmk)

Vladislav Bogdanov bubble at hoster-ok.com
Fri Nov 2 03:22:30 EDT 2012


02.11.2012 02:05, Andrew Beekhof wrote:
> On Thu, Nov 1, 2012 at 5:09 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>> 01.11.2012 02:47, Andrew Beekhof wrote:
>> ...
>>>>
>>>> One remark about that - it requires that gfs2 communicates with dlm in
>>>> the kernel space - so gfs_controld is not longer required. I think
>>>> Fedora 17 is the first version with that feature. And it is definitely
>>>> not available for EL6 (centos6 which I use).
>>>>
>>>> But I have preliminary success running GFS2 with corosync2 and pacemaker
>>>> 1.1.8 on EL6. dlm4 runs just fine as is (although it misses some
>>>> featured on EL6 because of kernel). And it still includes (not
>>>> documented) option enable_fscontrol, so user-space communication with fs
>>>> control daemons is supported. Even it that feature will be removed
>>>> upstream, it can be easily returned back - just several lines of code.
>>>> And I ported gfs_controld from cman to corosync2 (patch is very dirty
>>>> yet, made with scissors and needle, just a proof-of-concept that it even
>>>> can work). Some features are unsupported (f.e. nodir) and will not be
>>>> implemented by me.
>>>
>>> I'm impressed.  What was the motivation though?  You really really
>>> don't like CMAN? :-)
>>
>> Why should I like software which is going to die? ;)
>>
>> I believe that how things are done currently (third case from your list)
>> fully reflect my "perfectionistic" needs. I had many problems with
>> cman+pacemaker in a past. Most critical is that pacemaker and
>> dlm_controld react differently when node reappears back very soon after
>> if was lost (because pacemaker uses totem ? directly for membership, but
>> dlm uses CPG).
> 
> We both get it from the CPG and quorum APIs for option 3.

Yes, but not for 1 nor for 2. I saw described behavior with both of
them, but not with 3.
That's why I decided to go with 3 which I think conceptually right.

> 
>> Pacemaker accepts that, but controld freezes lockspaces,
>> waiting for fencing. But fencing is never done because nobody handles
>> "node lost" CPG event.
> 
> WTF.  Pacemaker should absolutely do this.  Bug report?

Sorry for being unclear.
I saw that with both 1 and 2 (where pacemaker did not use CPG), until I
"fixed" fencing at dlm layer for 1. I modified it to request fencing if
"node down" event occurs and then did not see freezes anymore. From what
I understand, "node down" CPG event occurs when corosync forms
transitional membership (at least pacemaker logged lines about that at
the same time with dlm freeze. And if stable membership occurs
(milli-)seconds after transitional one, pacemaker (as of probable 1.1.6)
did not fence re-appeared node. I can understand that - pacemaker can
absolutely live with that. But dlm cannot. And it is its task to do
proper fencing in case it cannot work, not pacemaker's. But that piece
was missing there. The same is (probably, I may be damn wrong here) true
for cman - I did a quick search for a CPG "node down" handler in its
sources but didn't find one. I suspect it was handled by some deprecated
daemon (f.e. groupd) in the past, but as of 3.1.7 I did not observe
handling for that.

As I go with option 3, I should not see that anymore even theoretically.

So no bug report for what I wont use anymore :)

> 
>> dlm does start fencing for "process lost", but
>> not for "node lost".
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list