[Pacemaker] [Partially SOLVED] pacemaker/dlm problems

Andrew Beekhof andrew at beekhof.net
Wed Nov 23 23:33:01 EST 2011

On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> Hi Andrew,
> I just found another problem with dlm_controld.pcmk (with your latest
> patch from github applied and also my fixes to actually build it - they
> are included in a message referenced by this one).
> One node which just requested fencing of another one stucks at printing
> that message where you print ctime() in fence_node_time() (pacemaker.c
> near 293) every second.

So not blocked, it just keeps repeating that message?
What date does it print?

Did you change it to the following?
  log_debug("Node %d was last shot at: %s", nodeid, ctime(*last_fenced_time));	

> No other messages appear, although
> fence_node_time() is called only from check_fencing_done() (cpg.c near
> 444). So, both of (last_fenced_time >= node->fail_time) and
> (!node->fence_queries || node->fence_time != last_fenced_time) are
> false, otherwise one of messages for that cases should be shown. Then,
> fence_node_time() seems to return 0 from
> if (wait_count)
>        return 0;
> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
> false), so it never reaches check_fencing_done() call and never return
> expected 1.
> Offending node was actually fenced, but that was actually not handled by
> dlm_controld.
> May I ask you to help me a bit with all that logic (as you already dived
> into dlm_controld sources again), I seem to be so near the success... :|
> btw, I cant find what source is your dlm repo forked from, may be you
> remember?

iirc, it was dlm.git on fedorahosted.

> Best,
> Vladislav
> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>> Hi Andrew,
>>>> All the more reason to start using the stonith api directly.
>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>> my build. Then it doesn't compile without attached patch.
>>> It may need to be rebased a bit against your tree.
>>> Now I have package built and am building node images. Will try shortly.
>> Fencing from within dlm_controld.pcmk still did not work with your first
>> patch against that _no_mainloop function (expected).
>> So I did my best to build packages from the current git tree.
>> Voila! I got failed node correctly fenced!
>> I'll do some more extensive testing next days, but I believe everything
>> should be much better now.
>> I knew you're genius he-he ;)
>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>> pacemaker using openais stack:
>> 1. Build pacemaker (as of 2011-09-28) from git.
>> 2. Apply attached patches to cluster-3.0.17 source tree.
>> 3. Build dlm_controld.pcmk
>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>> Best regards,
>> Vladislav
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

More information about the Pacemaker mailing list