[ClusterLabs] Pacemaker resource parameter reload confusion

Fri Oct 20 10:57:06 EDT 2017

On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
> 
> > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
> > > Ken Gaillot <kgaillot at redhat.com> writes:
> > > 
> > > > Hmm, stop+reload is definitely a bug. Can you attach (or email
> > > > it to
> > > > me privately, or file a bz with it attached) the above pe-input 
> > > > file
> > > > with any sensitive info removed?
> > > 
> > > I sent you the pe-input file privately.  It indeed shows the
> > > issue:
> > > 
> > > $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
> > > [...]
> > > Executing cluster transition:
> > >  * Resource action: vm-alder        stop on vhbl05
> > >  * Resource action: vm-alder        reload on vhbl05
> > > [...]
> > > 
> > > Hope you can easily get to the bottom of this.
> > 
> > This turned out to have the same underlying cause as CLBZ#5309. I
> > have
> > a fix pending review, which I expect to make it into the soon-to-
> > be-
> > released 1.1.18.
> 
> Great!
> 
> > It is a regression introduced in 1.1.15 by commit 2558d76f. The
> > logic
> > for reloads was consolidated in one place, but that happened to be
> > before restarts were scheduled, so it no longer had the right
> > information about whether a restart was needed. Now, it sets an
> > ordering flag that is used later to cancel the reload if the
> > restart
> > becomes required. I've also added a regression test for it.
> 
> Restarts shouldn't even enter the picture here, so I don't get your
> explanation.  But I also don't know the code, so that doesn't mean a
> thing.  I'll test the next RC to be sure.

:-)

Reloads are done in place of restarts, when circumstances allow. So
reloads are always related to (potential) restarts.

The problem arose because not all of the relevant circumstances are
known at the time the reload action is created. We may figure out later
that a resource the reloading resource depends on must be restarted,
therefore the reloading resource must be fully restarted instead of
reloaded. E.g. a database resource might otherwise be able to reload,
but not if the filesystem it's using is going away.

Previously in those cases, we would end up scheduling both the reload
and the restart. Now, we schedule only the restart.
-- 
Ken Gaillot <kgaillot at redhat.com>