[ClusterLabs] Pacemaker resource parameter reload confusion

Tue Oct 31 08:33:07 UTC 2017

Ken Gaillot <kgaillot at redhat.com> writes:

> On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote:
>
>> Ken Gaillot <kgaillot at redhat.com> writes:
>> 
>>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
>>>
>>>> Ken Gaillot <kgaillot at redhat.com> writes:
>>>> 
>>>>> Hmm, stop+reload is definitely a bug. Can you attach (or email it
>>>>> to me privately, or file a bz with it attached) the above pe-input
>>>>> file with any sensitive info removed?
>>>> 
>>>> I sent you the pe-input file privately.  It indeed shows the
>>>> issue:
>>>> 
>>>> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
>>>> [...]
>>>> Executing cluster transition:
>>>>  * Resource action: vm-alder        stop on vhbl05
>>>>  * Resource action: vm-alder        reload on vhbl05
>>>> [...]
>>>> 
>>>> Hope you can easily get to the bottom of this.
>>> 
>>> This turned out to have the same underlying cause as CLBZ#5309. I
>>> have a fix pending review, which I expect to make it into the
>>> soon-to-be-released 1.1.18.
>> 
>> Great!
>> 
>>> It is a regression introduced in 1.1.15 by commit 2558d76f. The
>>> logic for reloads was consolidated in one place, but that happened
>>> to be before restarts were scheduled, so it no longer had the right
>>> information about whether a restart was needed. Now, it sets an
>>> ordering flag that is used later to cancel the reload if the restart
>>> becomes required. I've also added a regression test for it.
>> 
>> Restarts shouldn't even enter the picture here, so I don't get your
>> explanation.  But I also don't know the code, so that doesn't mean a
>> thing.  I'll test the next RC to be sure.
>
> :-)
>
> Reloads are done in place of restarts, when circumstances allow. So
> reloads are always related to (potential) restarts.
>
> The problem arose because not all of the relevant circumstances are
> known at the time the reload action is created. We may figure out later
> that a resource the reloading resource depends on must be restarted,
> therefore the reloading resource must be fully restarted instead of
> reloaded. E.g. a database resource might otherwise be able to reload,
> but not if the filesystem it's using is going away.
>
> Previously in those cases, we would end up scheduling both the reload
> and the restart. Now, we schedule only the restart.

Hi Ken,

1.1.18-rc3 indeed schedules a restart, not a reload, like 1.1.16 did.
However, this wasn't my problem, I really expect a reload on the change
of a non-unique parameter.  Them problem was that 1.1.16 also executed a
stop action in parallel with the reload.

Maybe I test it wrong: I just copied the pe-input file to another system
(which doesn't even know this resource agent) running 1.1.18-rc3 and
gave it to crm_simulate.  Does the pe-input file contain all the
information necessary to decide between restart and reload?  The
op-force-restart attribute does not contain the name of the changed
parameter, but I can't find any info on what changed at all.  Should I
see a clean reload in this test setup at all?
-- 
Thanks,
Feri