[ClusterLabs] Resource Parameter Change Not Honoring Constraints

Wed Mar 11 17:24:10 EDT 2020

Hi,

I'm using Pacemaker 1.1.20 (yes, I know, a bit dated now). I noticed
when I modify a resource parameter (eg, update the value), this causes
the resource itself to restart. And that's fine, but when this
resource is restarted, it doesn't appear to honor the full set of
constraints for that resource.

I see the output like this (right after the resource parameter change):
...
Mar 11 20:43:25 localhost crmd[1943]:   notice: State transition S_IDLE -> S_POL
ICY_ENGINE
Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state: S_POLICY_ENG
INE
Mar 11 20:43:25 localhost pengine[1942]:   notice: Clearing failure of
p_bmd_140c58-1 on 140c58-1 because resource parameters have changed
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_bmd_140c58-1             (                   140c58-1 )   due to
resource definition change
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_dummy_g_lvm_140c58-1     (                   140c58-1 )   due to
required g_md_140c58-1 running
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_lvm_140c58_vg_01         (                   140c58-1 )   due to
required p_dummy_g_lvm_140c58-1 start
Mar 11 20:43:25 localhost pengine[1942]:   notice: Calculated
transition 41, saving inputs in
/var/lib/pacemaker/pengine/pe-input-173.bz2
Mar 11 20:43:25 localhost crmd[1943]:   notice: Initiating stop
operation p_lvm_140c58_vg_01_stop_0 on 140c58-1
Mar 11 20:43:25 localhost crmd[1943]:   notice: Transition aborted by
deletion of lrm_rsc_op[@id='p_bmd_140c58-1_last_failure_0']: Resource
operation removal
Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state:
S_TRANSITION_ENGINE
...

The stop on 'p_lvm_140c58_vg_01' then times out, because the other
constraint (to stop the service above LVM) is never executed. I can
see from the messages it never even tries to demote the resource above
that.

Yet, if I use crmsh at the shell, and do a restart on that same
resource, it works correctly, and all constraints are honored: crm
resource restart p_bmd_140c58-1

I can certainly provide my full cluster config if needed, but hoping
to keep this email concise for clarity. =)

I guess my questions are: 1) Is the difference in restart behavior
expected, and not all constraints are followed when resource
parameters change (or some other restart event that originated
internally like this)? 2) Or perhaps this is known bug that was
already resolved in newer versions of Pacemaker?

I searched a bit for #2 but I didn't get many (well any) hits on other
users experiencing this behavior.

Many thanks in advance.

--Marc