[Pacemaker] Build dlm_controld for pacemaker stack (dlm_controld.pcmk)

Wed Nov 7 22:36:58 EST 2012

On Mon, Nov 5, 2012 at 5:33 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 05.11.2012 08:40, Andrew Beekhof wrote:
>> On Fri, Nov 2, 2012 at 6:22 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> 02.11.2012 02:05, Andrew Beekhof wrote:
>>>> On Thu, Nov 1, 2012 at 5:09 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>> 01.11.2012 02:47, Andrew Beekhof wrote:
>>>>> ...
>>>>>>>
>>>>>>> One remark about that - it requires that gfs2 communicates with dlm in
>>>>>>> the kernel space - so gfs_controld is not longer required. I think
>>>>>>> Fedora 17 is the first version with that feature. And it is definitely
>>>>>>> not available for EL6 (centos6 which I use).
>>>>>>>
>>>>>>> But I have preliminary success running GFS2 with corosync2 and pacemaker
>>>>>>> 1.1.8 on EL6. dlm4 runs just fine as is (although it misses some
>>>>>>> featured on EL6 because of kernel). And it still includes (not
>>>>>>> documented) option enable_fscontrol, so user-space communication with fs
>>>>>>> control daemons is supported. Even it that feature will be removed
>>>>>>> upstream, it can be easily returned back - just several lines of code.
>>>>>>> And I ported gfs_controld from cman to corosync2 (patch is very dirty
>>>>>>> yet, made with scissors and needle, just a proof-of-concept that it even
>>>>>>> can work). Some features are unsupported (f.e. nodir) and will not be
>>>>>>> implemented by me.
>>>>>>
>>>>>> I'm impressed.  What was the motivation though?  You really really
>>>>>> don't like CMAN? :-)
>>>>>
>>>>> Why should I like software which is going to die? ;)
>>>>>
>>>>> I believe that how things are done currently (third case from your list)
>>>>> fully reflect my "perfectionistic" needs. I had many problems with
>>>>> cman+pacemaker in a past. Most critical is that pacemaker and
>>>>> dlm_controld react differently when node reappears back very soon after
>>>>> if was lost (because pacemaker uses totem ? directly for membership, but
>>>>> dlm uses CPG).
>>>>
>>>> We both get it from the CPG and quorum APIs for option 3.
>>>
>>> Yes, but not for 1 nor for 2.
>>
>> Not quite. We used to ignore it for option 2, but not anymore.
>> Option 2 uses CPG for messaging.
>>
>>> I saw described behavior with both of
>>> them, but not with 3.
>>> That's why I decided to go with 3 which I think conceptually right.
>>>
>>>>
>>>>> Pacemaker accepts that, but controld freezes lockspaces,
>>>>> waiting for fencing. But fencing is never done because nobody handles
>>>>> "node lost" CPG event.
>>>>
>>>> WTF.  Pacemaker should absolutely do this.  Bug report?
>>>
>>> Sorry for being unclear.
>>> I saw that with both 1 and 2 (where pacemaker did not use CPG), until I
>>> "fixed" fencing at dlm layer for 1. I modified it to request fencing if
>>> "node down" event occurs and then did not see freezes anymore. From what
>>> I understand, "node down" CPG event occurs when corosync forms
>>> transitional membership (at least pacemaker logged lines about that at
>>> the same time with dlm freeze. And if stable membership occurs
>>> (milli-)seconds after transitional one, pacemaker (as of probable 1.1.6)
>>> did not fence re-appeared node. I can understand that - pacemaker can
>>> absolutely live with that. But dlm cannot.
>>
>> Right. Any sort of membership hiccup is fatal as far as the dlm is concerned.
>> But even with options 1 and 2, it should still make a fencing request.
>
> I'm afraid no. At least not with 3.0.17 or 3.1.7.

Actually the system as a whole does, but you have to know where to look.
Its fenced that triggers the node fencing on CPG change.

Look for

		if (left_list[i].reason == CPG_REASON_NODEDOWN ||
		    left_list[i].reason == CPG_REASON_PROCDOWN) {
			memb->failed = 1;
			cg->failed_count++;

in add_change() in fence/fenced/cpg.c

and later:

	/* failed nodes in this change become victims */
	add_victims(fd, cg);

Better understanding of these interdependencies is why we no longer
recommend starting cman via directives in corosync.conf - because that
wont start fenced and any other bits that are needed.

CTS has also improved to test the integration better.
We've spent a lot of time recently specifically making sure that
cman/fenced initiated fencing works just as well as pacemaker
initiated fencing does.

> Sources are clear
> about that - CPG node down event does not result in fencing requested by
> dlm_controld. And that was a major problem for me with options 1 and 2.
> One-line patch solved that though. But I decided that cman is a no-go
> for me anymore because such critical issues as proper fencing should be
> tested thoroughly and if they are not, then I will feel like sitting on
> a bomb with it.
>
>>
>> Without fence_pcmk in cluster.conf that request might have gotten
>> lost, but with 1.1.8 I would expect the node to be shot - regardless
>> of whether the rest of Pacemaker thought it was ok.
>> Thats why going direct to stonithd was an important change.
>
> Aha. I tried cman last time before fence_pcmk was written (and before
> that fencing call dlm_controld.pcmk uses was modified to go straight to
> stonithd). I recall I was polishing option 1 that time (after throwing
> cman away), and first revision of that move did not work because it used
> async libstonithd call to fence a node. That's why I used direct calls
> to stonith in my version of dlm_controld.pcmk. All that resulted in
> fully-working stack and I decided to go with option 3 only after hearing
> from you that you do not test pacemaker with corosync1 yourselves anymore.
>
> That was second major problem with option 1 - before all that changes
> there was a possibility for fencing request to be dropped silently. And
> I actually hit that. I do not know if it fully works with stock 3.0.17
> dlm_controld.pcmk (I suspect no because of issue 1) but with my builds
> it is stable.
>
> Anyways, I seem to be happy with option 3 on EL6, it introduces clean
> and straight-forward model of cluster stack and it works perfectly, so I
> do not see any reasons to return back to option 1 or 2.

Happy to hear it.
I'm not actually trying to make you stop using it :)

>
>
>>
>>> And it is its task to do
>>> proper fencing in case it cannot work, not pacemaker's. But that piece
>>> was missing there. The same is (probably, I may be damn wrong here) true
>>> for cman - I did a quick search for a CPG "node down" handler in its
>>> sources but didn't find one. I suspect it was handled by some deprecated
>>> daemon (f.e. groupd) in the past, but as of 3.1.7 I did not observe
>>> handling for that.
>>>
>>> As I go with option 3, I should not see that anymore even theoretically.
>>>
>>> So no bug report for what I wont use anymore :)
>>>
>>>>
>>>>> dlm does start fencing for "process lost", but
>>>>> not for "node lost".
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org