[ClusterLabs] Pacemaker resource parameter reload confusion

Ken Gaillot kgaillot at redhat.com
Tue Oct 31 11:11:36 EDT 2017


On Tue, 2017-10-31 at 09:33 +0100, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
> 
> > On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote:
> > 
> > > Ken Gaillot <kgaillot at redhat.com> writes:
> > > 
> > > > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
> > > > 
> > > > > Ken Gaillot <kgaillot at redhat.com> writes:
> > > > > 
> > > > > > Hmm, stop+reload is definitely a bug. Can you attach (or
> > > > > > email it
> > > > > > to me privately, or file a bz with it attached) the above
> > > > > > pe-input
> > > > > > file with any sensitive info removed?
> > > > > 
> > > > > I sent you the pe-input file privately.  It indeed shows the
> > > > > issue:
> > > > > 
> > > > > $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
> > > > > [...]
> > > > > Executing cluster transition:
> > > > >  * Resource action: vm-alder        stop on vhbl05
> > > > >  * Resource action: vm-alder        reload on vhbl05
> > > > > [...]
> > > > > 
> > > > > Hope you can easily get to the bottom of this.
> > > > 
> > > > This turned out to have the same underlying cause as CLBZ#5309.
> > > > I
> > > > have a fix pending review, which I expect to make it into the
> > > > soon-to-be-released 1.1.18.
> > > 
> > > Great!
> > > 
> > > > It is a regression introduced in 1.1.15 by commit 2558d76f. The
> > > > logic for reloads was consolidated in one place, but that
> > > > happened
> > > > to be before restarts were scheduled, so it no longer had the
> > > > right
> > > > information about whether a restart was needed. Now, it sets an
> > > > ordering flag that is used later to cancel the reload if the
> > > > restart
> > > > becomes required. I've also added a regression test for it.
> > > 
> > > Restarts shouldn't even enter the picture here, so I don't get
> > > your
> > > explanation.  But I also don't know the code, so that doesn't
> > > mean a
> > > thing.  I'll test the next RC to be sure.
> > 
> > :-)
> > 
> > Reloads are done in place of restarts, when circumstances allow. So
> > reloads are always related to (potential) restarts.
> > 
> > The problem arose because not all of the relevant circumstances are
> > known at the time the reload action is created. We may figure out
> > later
> > that a resource the reloading resource depends on must be
> > restarted,
> > therefore the reloading resource must be fully restarted instead of
> > reloaded. E.g. a database resource might otherwise be able to
> > reload,
> > but not if the filesystem it's using is going away.
> > 
> > Previously in those cases, we would end up scheduling both the
> > reload
> > and the restart. Now, we schedule only the restart.
> 
> Hi Ken,
> 
> 1.1.18-rc3 indeed schedules a restart, not a reload, like 1.1.16 did.
> However, this wasn't my problem, I really expect a reload on the
> change
> of a non-unique parameter.  Them problem was that 1.1.16 also
> executed a
> stop action in parallel with the reload.
> 
> Maybe I test it wrong: I just copied the pe-input file to another
> system
> (which doesn't even know this resource agent) running 1.1.18-rc3 and
> gave it to crm_simulate.  Does the pe-input file contain all the
> information necessary to decide between restart and reload?  The
> op-force-restart attribute does not contain the name of the changed
> parameter, but I can't find any info on what changed at all.  Should
> I
> see a clean reload in this test setup at all?

The pe-input is indeed entirely sufficient.

I forgot to check why the reload was not possible in this case. It
turns out it is this:

   trace: check_action_definition:      Resource vm-alder doesn't know
how to reload

Does the resource agent implement the "reload" action and advertise it
in the <actions> section of its metadata?
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list