[ClusterLabs] Resource Parameter Change Not Honoring Constraints

Thu Mar 12 10:51:15 EDT 2020

On Wed, 2020-03-11 at 17:24 -0400, Marc Smith wrote:
> Hi,
> 
> I'm using Pacemaker 1.1.20 (yes, I know, a bit dated now). I noticed

I'd still consider that recent :)

> when I modify a resource parameter (eg, update the value), this
> causes
> the resource itself to restart. And that's fine, but when this
> resource is restarted, it doesn't appear to honor the full set of
> constraints for that resource.
> 
> I see the output like this (right after the resource parameter
> change):
> ...
> Mar 11 20:43:25 localhost crmd[1943]:   notice: State transition
> S_IDLE -> S_POL
> ICY_ENGINE
> Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state:
> S_POLICY_ENG
> INE
> Mar 11 20:43:25 localhost pengine[1942]:   notice: Clearing failure
> of
> p_bmd_140c58-1 on 140c58-1 because resource parameters have changed
> Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> p_bmd_140c58-1             (                   140c58-1 )   due to
> resource definition change
> Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> p_dummy_g_lvm_140c58-1     (                   140c58-1 )   due to
> required g_md_140c58-1 running
> Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> p_lvm_140c58_vg_01         (                   140c58-1 )   due to
> required p_dummy_g_lvm_140c58-1 start
> Mar 11 20:43:25 localhost pengine[1942]:   notice: Calculated
> transition 41, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-173.bz2
> Mar 11 20:43:25 localhost crmd[1943]:   notice: Initiating stop
> operation p_lvm_140c58_vg_01_stop_0 on 140c58-1
> Mar 11 20:43:25 localhost crmd[1943]:   notice: Transition aborted by
> deletion of lrm_rsc_op[@id='p_bmd_140c58-1_last_failure_0']: Resource
> operation removal
> Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state:
> S_TRANSITION_ENGINE
> ...
> 
> The stop on 'p_lvm_140c58_vg_01' then times out, because the other
> constraint (to stop the service above LVM) is never executed. I can
> see from the messages it never even tries to demote the resource
> above
> that.
> 
> Yet, if I use crmsh at the shell, and do a restart on that same
> resource, it works correctly, and all constraints are honored: crm
> resource restart p_bmd_140c58-1
> 
> I can certainly provide my full cluster config if needed, but hoping
> to keep this email concise for clarity. =)
> 
> I guess my questions are: 1) Is the difference in restart behavior
> expected, and not all constraints are followed when resource
> parameters change (or some other restart event that originated
> internally like this)? 2) Or perhaps this is known bug that was
> already resolved in newer versions of Pacemaker?

No to both. Can you attach that pe-input-173.bz2 file (with any
sensitive info removed)?
> 
> I searched a bit for #2 but I didn't get many (well any) hits on
> other
> users experiencing this behavior.
> 
> Many thanks in advance.
> 
> --Marc
-- 
Ken Gaillot <kgaillot at redhat.com>