[Pacemaker] Build dlm_controld for pacemaker stack (dlm_controld.pcmk)

Mon Nov 5 00:40:06 EST 2012

On Fri, Nov 2, 2012 at 6:22 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 02.11.2012 02:05, Andrew Beekhof wrote:
>> On Thu, Nov 1, 2012 at 5:09 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> 01.11.2012 02:47, Andrew Beekhof wrote:
>>> ...
>>>>>
>>>>> One remark about that - it requires that gfs2 communicates with dlm in
>>>>> the kernel space - so gfs_controld is not longer required. I think
>>>>> Fedora 17 is the first version with that feature. And it is definitely
>>>>> not available for EL6 (centos6 which I use).
>>>>>
>>>>> But I have preliminary success running GFS2 with corosync2 and pacemaker
>>>>> 1.1.8 on EL6. dlm4 runs just fine as is (although it misses some
>>>>> featured on EL6 because of kernel). And it still includes (not
>>>>> documented) option enable_fscontrol, so user-space communication with fs
>>>>> control daemons is supported. Even it that feature will be removed
>>>>> upstream, it can be easily returned back - just several lines of code.
>>>>> And I ported gfs_controld from cman to corosync2 (patch is very dirty
>>>>> yet, made with scissors and needle, just a proof-of-concept that it even
>>>>> can work). Some features are unsupported (f.e. nodir) and will not be
>>>>> implemented by me.
>>>>
>>>> I'm impressed.  What was the motivation though?  You really really
>>>> don't like CMAN? :-)
>>>
>>> Why should I like software which is going to die? ;)
>>>
>>> I believe that how things are done currently (third case from your list)
>>> fully reflect my "perfectionistic" needs. I had many problems with
>>> cman+pacemaker in a past. Most critical is that pacemaker and
>>> dlm_controld react differently when node reappears back very soon after
>>> if was lost (because pacemaker uses totem ? directly for membership, but
>>> dlm uses CPG).
>>
>> We both get it from the CPG and quorum APIs for option 3.
>
> Yes, but not for 1 nor for 2.

Not quite. We used to ignore it for option 2, but not anymore.
Option 2 uses CPG for messaging.

> I saw described behavior with both of
> them, but not with 3.
> That's why I decided to go with 3 which I think conceptually right.
>
>>
>>> Pacemaker accepts that, but controld freezes lockspaces,
>>> waiting for fencing. But fencing is never done because nobody handles
>>> "node lost" CPG event.
>>
>> WTF.  Pacemaker should absolutely do this.  Bug report?
>
> Sorry for being unclear.
> I saw that with both 1 and 2 (where pacemaker did not use CPG), until I
> "fixed" fencing at dlm layer for 1. I modified it to request fencing if
> "node down" event occurs and then did not see freezes anymore. From what
> I understand, "node down" CPG event occurs when corosync forms
> transitional membership (at least pacemaker logged lines about that at
> the same time with dlm freeze. And if stable membership occurs
> (milli-)seconds after transitional one, pacemaker (as of probable 1.1.6)
> did not fence re-appeared node. I can understand that - pacemaker can
> absolutely live with that. But dlm cannot.

Right. Any sort of membership hiccup is fatal as far as the dlm is concerned.
But even with options 1 and 2, it should still make a fencing request.

Without fence_pcmk in cluster.conf that request might have gotten
lost, but with 1.1.8 I would expect the node to be shot - regardless
of whether the rest of Pacemaker thought it was ok.
Thats why going direct to stonithd was an important change.

> And it is its task to do
> proper fencing in case it cannot work, not pacemaker's. But that piece
> was missing there. The same is (probably, I may be damn wrong here) true
> for cman - I did a quick search for a CPG "node down" handler in its
> sources but didn't find one. I suspect it was handled by some deprecated
> daemon (f.e. groupd) in the past, but as of 3.1.7 I did not observe
> handling for that.
>
> As I go with option 3, I should not see that anymore even theoretically.
>
> So no bug report for what I wont use anymore :)
>
>>
>>> dlm does start fencing for "process lost", but
>>> not for "node lost".
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org