[ClusterLabs] Antw: Salvaging aborted resource migration

Thu Sep 27 07:36:26 UTC 2018

Hi!

Obviously you violated the most important cluster rule that is "be patient".
Maybe the next important is "Don't change the configuration while the cluster
is not in IDLE state" ;-)

I feel these are issues that should be fixed, but the above rules make your
life easier while these issues still exist.

Regards,
Ulrich

>>> Ferenc Wágner <wagner.ferenc at kifu.gov.hu> schrieb am 27.09.2018 um 08:37
in
Nachricht <87tvmb5ttw.fsf at lant.ki.iif.hu>:
> Hi,
> 
> The current behavior of cancelled migration with Pacemaker 1.1.16 with a
> resource implementing push migration:
> 
> # /usr/sbin/crm_resource ‑‑ban ‑r vm‑conv‑4
> 
> vhbl03 crmd[10017]:   notice: State transition S_IDLE ‑> S_POLICY_ENGINE
> vhbl03 pengine[10016]:   notice: Migrate vm‑conv‑4#011(Started vhbl07 ‑>
vhbl04)
> vhbl03 crmd[10017]:   notice: Initiating migrate_to operation 
> vm‑conv‑4_migrate_to_0 on vhbl07
> vhbl03 pengine[10016]:   notice: Calculated transition 4633, saving inputs 
> in /var/lib/pacemaker/pengine/pe‑input‑1069.bz2
> [...]
> 
> At this point, with the migration still ongoing, I wanted to get rid of
> the constraint:
> 
> # /usr/sbin/crm_resource ‑‑clear ‑r vm‑conv‑4
> 
> vhbl03 crmd[10017]:   notice: Transition aborted by deletion of 
> rsc_location[@id='cli‑ban‑vm‑conv‑4‑on‑vhbl07']: Configuration change
> vhbl07 crmd[10233]:   notice: Result of migrate_to operation for vm‑conv‑4
on 
> vhbl07: 0 (ok)
> vhbl03 crmd[10017]:   notice: Transition 4633 (Complete=6, Pending=0, 
> Fired=0, Skipped=1, Incomplete=6, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑1069.bz2): Stopped
> vhbl03 pengine[10016]:   notice: Resource vm‑conv‑4 can no longer migrate to

> vhbl04. Stopping on vhbl07 too
> vhbl03 pengine[10016]:   notice: Reload  vm‑conv‑4#011(Started vhbl07)
> vhbl03 pengine[10016]:   notice: Calculated transition 4634, saving inputs 
> in /var/lib/pacemaker/pengine/pe‑input‑1070.bz2
> vhbl03 crmd[10017]:   notice: Initiating stop operation vm‑conv‑4_stop_0 on

> vhbl07
> vhbl03 crmd[10017]:   notice: Initiating stop operation vm‑conv‑4_stop_0 on

> vhbl04
> vhbl03 crmd[10017]:   notice: Initiating reload operation vm‑conv‑4_reload_0

> on vhbl04
> 
> This recovery was entirely unnecessary, as the resource successfully
> migrated to vhbl04 (the migrate_from operation does nothing).  Pacemaker
> does not know this, but is there a way to educate it?  I think in this
> special case it is possible to redesign the agent making migrate_to a
> no‑op and doing everything in migrate_from, which would significantly
> reduce the window between the start points of the two "halfs", but I'm
> not sure that would help in the end: Pacemaker could still decide to do
> an unnecessary stop+start recovery.  Would it?  I failed to find any
> documentation on recovery from aborted migration transitions.  I don't
> expect on‑fail (for migrate_* ops, not me) to apply here, does it?
> 
> Side question: why initiate a reload in any case, like above?
> 
> Even more side question: could you please consider using space instead
> of TAB in syslog messages?  (Actually, I wouldn't mind getting rid of
> them altogether in any output.)
> ‑‑ 
> Thanks,
> Feri
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org