[ClusterLabs] Antw: Salvaging aborted resource migration
Ken Gaillot
kgaillot at redhat.com
Thu Sep 27 10:31:32 EDT 2018
On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote:
> Hi!
>
> Obviously you violated the most important cluster rule that is "be
> patient".
> Maybe the next important is "Don't change the configuration while the
> cluster
> is not in IDLE state" ;-)
Agreed -- although even idle, removing a ban can result in a migration
back (if something like stickiness doesn't prevent it).
There's currently no way to tell pacemaker that an operation (i.e.
migrate_from) is a no-op and can be ignored. If a migration is only
partially completed, it has to be considered a failure and reverted.
I'm not sure why the reload was scheduled; I suspect it's a bug due to
a restart being needed but no parameters having changed. There should
be special handling for a partial migration to make the stop required.
> I feel these are issues that should be fixed, but the above rules
> make your
> life easier while these issues still exist.
>
> Regards,
> Ulrich
>
> > > > Ferenc Wágner <wagner.ferenc at kifu.gov.hu> schrieb am 27.09.2018
> > > > um 08:37
>
> in
> Nachricht <87tvmb5ttw.fsf at lant.ki.iif.hu>:
> > Hi,
> >
> > The current behavior of cancelled migration with Pacemaker 1.1.16
> > with a
> > resource implementing push migration:
> >
> > # /usr/sbin/crm_resource ‑‑ban ‑r vm‑conv‑4
> >
> > vhbl03 crmd[10017]: notice: State transition S_IDLE ‑>
> > S_POLICY_ENGINE
> > vhbl03 pengine[10016]: notice: Migrate vm‑conv‑4#011(Started
> > vhbl07 ‑>
>
> vhbl04)
> > vhbl03 crmd[10017]: notice: Initiating migrate_to operation
> > vm‑conv‑4_migrate_to_0 on vhbl07
> > vhbl03 pengine[10016]: notice: Calculated transition 4633, saving
> > inputs
> > in /var/lib/pacemaker/pengine/pe‑input‑1069.bz2
> > [...]
> >
> > At this point, with the migration still ongoing, I wanted to get
> > rid of
> > the constraint:
> >
> > # /usr/sbin/crm_resource ‑‑clear ‑r vm‑conv‑4
> >
> > vhbl03 crmd[10017]: notice: Transition aborted by deletion of
> > rsc_location[@id='cli‑ban‑vm‑conv‑4‑on‑vhbl07']: Configuration
> > change
> > vhbl07 crmd[10233]: notice: Result of migrate_to operation for
> > vm‑conv‑4
>
> on
> > vhbl07: 0 (ok)
> > vhbl03 crmd[10017]: notice: Transition 4633 (Complete=6,
> > Pending=0,
> > Fired=0, Skipped=1, Incomplete=6,
> > Source=/var/lib/pacemaker/pengine/pe‑input‑1069.bz2): Stopped
> > vhbl03 pengine[10016]: notice: Resource vm‑conv‑4 can no longer
> > migrate to
> > vhbl04. Stopping on vhbl07 too
> > vhbl03 pengine[10016]: notice: Reload vm‑conv‑4#011(Started
> > vhbl07)
> > vhbl03 pengine[10016]: notice: Calculated transition 4634, saving
> > inputs
> > in /var/lib/pacemaker/pengine/pe‑input‑1070.bz2
> > vhbl03 crmd[10017]: notice: Initiating stop operation
> > vm‑conv‑4_stop_0 on
> > vhbl07
> > vhbl03 crmd[10017]: notice: Initiating stop operation
> > vm‑conv‑4_stop_0 on
> > vhbl04
> > vhbl03 crmd[10017]: notice: Initiating reload operation
> > vm‑conv‑4_reload_0
> > on vhbl04
> >
> > This recovery was entirely unnecessary, as the resource
> > successfully
> > migrated to vhbl04 (the migrate_from operation does
> > nothing). Pacemaker
> > does not know this, but is there a way to educate it? I think in
> > this
> > special case it is possible to redesign the agent making migrate_to
> > a
> > no‑op and doing everything in migrate_from, which would
> > significantly
> > reduce the window between the start points of the two "halfs", but
> > I'm
> > not sure that would help in the end: Pacemaker could still decide
> > to do
> > an unnecessary stop+start recovery. Would it? I failed to find
> > any
> > documentation on recovery from aborted migration transitions. I
> > don't
> > expect on‑fail (for migrate_* ops, not me) to apply here, does it?
> >
> > Side question: why initiate a reload in any case, like above?
> >
> > Even more side question: could you please consider using space
> > instead
> > of TAB in syslog messages? (Actually, I wouldn't mind getting rid
> > of
> > them altogether in any output.)
> > ‑‑
> > Thanks,
> > Feri
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> > h.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list