[Pacemaker] bug in ordering syntax?

Frank DiMeo Frank.DiMeo at bigbandnet.com
Wed Dec 2 13:38:26 EST 2009


I applied the patch and added a score (of INFINITY) to the rsc_order tag.  Unfortunately, the behavior remains the same.  Log and xml file enclosed.

-Frank

> -----Original Message-----
> From: Frank DiMeo [mailto:Frank.DiMeo at bigbandnet.com]
> Sent: Wednesday, December 02, 2009 9:22 AM
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] bug in ordering syntax?
> 
> Ask and ye shall receive. :)
> 
> I'm enclosing my openais init script, which I'm running on my two node
> cluster made up of identical Ubuntu (9.04) machines called ubuntu_2 and
> ubuntu_1.  Running pacemaker 1.06 from the tip as of a month ago or so.
> 
> I'm also enclosing two sets of files which may help you see whats
> happening.
> 
> The "working" set:
> 
> 4rsc_worlds_coloc_ordered.xml - this is my initial configuration file.
> When I use this to initial my cluster, the 4 resources all start up in
> order, on the right node, and move together when I put nodes in and out
> of standby.
> 
> goodconfig_debug.txt - the log file (from ubuntu_1) showing what
> happens when the resources are running on node "ubuntu_2" and I put
> that node into standby.  All resources are moved to "ubuntu_1".  If I
> stop openais, everything shuts down quickly and clean, and no processes
> (like lrmd, pengine, etc) are left running.
> 
> The "not working" set:
> 
> 4rsc_worlds_coloc_ordered_alt1.xml - this is identical to the xml file
> in the working set, except I use the compact syntax for ordering.
> 
> badconfig_debug.txt - the log file (from ubuntu_1) showing what happens
> when the resources are running on node "ubuntu_2" and I put that node
> into standby.  The pe wants to move them to ubuntu_1, but the pe only
> seems to generate "pseudo actions" and never really moves anything.
> The resources continue to run on node ubuntu_2 even when the node is in
> standby!  Further, if I try to shut down openais on ubuntu_2 at this
> point (using the /etc/init.d/openais script enclosed), after a long
> time, corosync stops, but lrmd and pengine keep running, and become
> children of the init process.  Again, the resources keep running even
> at this point, which is because they are never commanded to stop.
> 
> I can send you my RA's and the resources themselves (which are just
> bash scripts) if you'd like.
> 
> I'll apply the patch you pointed to and let you know what happens.
> 
> Thanks very much,
> -Frank
> 
> 
> > -----Original Message-----
> > From: Andrew Beekhof [mailto:andrew at beekhof.net]
> > Sent: Wednesday, December 02, 2009 6:00 AM
> > To: pacemaker at oss.clusterlabs.org
> > Subject: Re: [Pacemaker] bug in ordering syntax?
> >
> > On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo
> > <Frank.DiMeo at bigbandnet.com> wrote:
> > > I'm experimenting with startup sequence and co-location control,
> and
> > think I
> > > may have stumbled across a bug.
> > >
> > >
> > >
> > > I have two xml files that I use in my testing as my initial
> > configuration of
> > > a two node cluster.  I start each node with no configuration, and
> > then use
> > > cibadmin to "source in" the xml file.  Each file defines two
> > resources as
> > > well as a startup order and collocation definition.  The only
> > difference
> > > between the two files is the syntax I use to specify the startup
> > order.
> > >
> > >
> > >
> > > When I use the syntax:
> > >
> > >
> > >
> > > <rsc_order id="order-1" first="world1" then="world2"
> score="INFINITY"
> > />
> > >
> > >
> > >
> > > Everything works fine.  I can put either of the two nodes into
> > standby while
> > > resources are running there, and the resources move to the other
> > > node
> > as
> > > expected.
> > >
> > >
> > >
> > > However, when I use the syntax:
> > >
> > >
> > >
> > > - <<rsc_order id="order-1">
> >
> > You're missing a score.  Without one it defaults to 0 (which means
> > optional).
> > However, IIRC, the 1.0.6 schema won't allow you to set a score there
> > so you'll need to apply the following patch:
> >    http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c
> >
> > >
> > > - <  <resource_set id="order-1-set-1" sequential="true">
> > >
> > >   <            <resource_ref id="world1" />
> > >
> > >   <            <resource_ref id="world2" />
> > >
> > >   </resource_set>
> > >
> > >  </rsc_order>
> > >
> > >
> > >
> > >
> > >
> > > Several bad things happen.  First, the resources don't move off the
> > node
> > > that is put into standby, even though the alternate node is running
> > and able
> > > to run the resources.
> >
> > Did you remove the other ordering constraint first?
> >
> > > Second, attempting to shut down openais on the node running the
> > > resources after attempting a forced move (by putting the
> > node
> > > into standby) leaves both the lrmd and pengine processes running
> > > (but children of process 1 (init), and the resources continue to
> run
> > > on
> > the that
> > > node even after openais is stopped.
> >
> > I suspect you've a faulty init script there.  See other email.
> >
> > > I turned debug on in crmd and in the logs and recorded what happens
> > when I
> > > force standby, and I notice that using the first syntax causes
> > > te_rsc_command to be executed to send a shut down message to the
> > > node
> > where
> > > the resources are running (which seems to work), while using the
> > second
> > > syntax causes te_pseudo_action to be called in approximately the
> > > same
> > place
> > > in the log, but no shutdown of resources happens (I can't really
> > > tell
> > what
> > > this is supposed to be doing).
> >
> > Neither can I - you didnt attach the logs :-)
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: badconfig_rscwithscore.txt
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091202/d3913bb1/attachment-0003.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4rsc_worlds_coloc_ordered_alt1.xml
Type: text/xml
Size: 3960 bytes
Desc: 4rsc_worlds_coloc_ordered_alt1.xml
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091202/d3913bb1/attachment-0003.xml>


More information about the Pacemaker mailing list