[ClusterLabs] Salvaging aborted resource migration
Ferenc Wágner
wagner.ferenc at kifu.gov.hu
Thu Sep 27 02:37:47 EDT 2018
Hi,
The current behavior of cancelled migration with Pacemaker 1.1.16 with a
resource implementing push migration:
# /usr/sbin/crm_resource --ban -r vm-conv-4
vhbl03 crmd[10017]: notice: State transition S_IDLE -> S_POLICY_ENGINE
vhbl03 pengine[10016]: notice: Migrate vm-conv-4#011(Started vhbl07 -> vhbl04)
vhbl03 crmd[10017]: notice: Initiating migrate_to operation vm-conv-4_migrate_to_0 on vhbl07
vhbl03 pengine[10016]: notice: Calculated transition 4633, saving inputs in /var/lib/pacemaker/pengine/pe-input-1069.bz2
[...]
At this point, with the migration still ongoing, I wanted to get rid of
the constraint:
# /usr/sbin/crm_resource --clear -r vm-conv-4
vhbl03 crmd[10017]: notice: Transition aborted by deletion of rsc_location[@id='cli-ban-vm-conv-4-on-vhbl07']: Configuration change
vhbl07 crmd[10233]: notice: Result of migrate_to operation for vm-conv-4 on vhbl07: 0 (ok)
vhbl03 crmd[10017]: notice: Transition 4633 (Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-1069.bz2): Stopped
vhbl03 pengine[10016]: notice: Resource vm-conv-4 can no longer migrate to vhbl04. Stopping on vhbl07 too
vhbl03 pengine[10016]: notice: Reload vm-conv-4#011(Started vhbl07)
vhbl03 pengine[10016]: notice: Calculated transition 4634, saving inputs in /var/lib/pacemaker/pengine/pe-input-1070.bz2
vhbl03 crmd[10017]: notice: Initiating stop operation vm-conv-4_stop_0 on vhbl07
vhbl03 crmd[10017]: notice: Initiating stop operation vm-conv-4_stop_0 on vhbl04
vhbl03 crmd[10017]: notice: Initiating reload operation vm-conv-4_reload_0 on vhbl04
This recovery was entirely unnecessary, as the resource successfully
migrated to vhbl04 (the migrate_from operation does nothing). Pacemaker
does not know this, but is there a way to educate it? I think in this
special case it is possible to redesign the agent making migrate_to a
no-op and doing everything in migrate_from, which would significantly
reduce the window between the start points of the two "halfs", but I'm
not sure that would help in the end: Pacemaker could still decide to do
an unnecessary stop+start recovery. Would it? I failed to find any
documentation on recovery from aborted migration transitions. I don't
expect on-fail (for migrate_* ops, not me) to apply here, does it?
Side question: why initiate a reload in any case, like above?
Even more side question: could you please consider using space instead
of TAB in syslog messages? (Actually, I wouldn't mind getting rid of
them altogether in any output.)
--
Thanks,
Feri
More information about the Users
mailing list