[ClusterLabs] Missing success log message for resource migration

Fri Feb 19 16:18:06 EST 2021

On Fri, 2021-02-19 at 11:06 +0100, Ulrich Windl wrote:
> Hi!
> 
> Inspecting the logs after the cluster had rebalanced resources, I'm
> wondering:
> It looks as if pacemaker-controld does log a success message when a
> local migration succeeded, but not if a remote one did.
> 
> Actions planned:
> Migrate    prm_xen_test-jeos1            ( h16 -> h18 )
> Migrate    prm_xen_test-jeos2            ( h18 -> h19 )
> Migrate    prm_xen_test-jeos4            ( h16 -> h18 )
> 
> Like this (h19 being DC):
> 
> h19 pacemaker-controld[7539]:  notice: Initiating migrate_from
> operation prm_xen_test-jeos2_migrate_from_0 locally on h19
> h19 pacemaker-controld[7539]:  notice: Result of migrate_from
> operation for prm_xen_test-jeos2 on h19: ok
> 
> But for a non-local migration it looks quite different:
> h19 pacemaker-controld[7539]:  notice: Initiating migrate_to
> operation prm_xen_test-jeos1_migrate_to_0 on h16
> h19 pacemaker-controld[7539]:  notice: Initiating migrate_to
> operation prm_xen_test-jeos4_migrate_to_0 on h16

Correct, it's like this for all actions -- the DC just logs the
"Initiating" message, and the node that actually executes the action
logs the result. It just so happens that sometimes that's the same
node.

In the Pacemaker detail log (pacemaker.log), there are info-level
messages for results, currently looking like

 Transition ... action ... (... on ...) confirmed: ... 

or timed out or whatever.

> h19 pacemaker-controld[7539]:  notice: Initiating migrate_from
> operation prm_xen_test-jeos4_migrate_from_0 on h18
> h19 pacemaker-controld[7539]:  notice: Initiating migrate_from
> operation prm_xen_test-jeos1_migrate_from_0 on h18
> 
> h19 pacemaker-controld[7539]:  notice: Initiating stop operation
> prm_xen_test-jeos4_stop_0 on h16
> h19 pacemaker-controld[7539]:  notice: Initiating stop operation
> prm_xen_test-jeos1_stop_0 on h16
> 
> h19 pacemaker-controld[7539]:  notice: Initiating monitor operation
> prm_xen_test-jeos4_monitor_600000 on h18
> h19 pacemaker-controld[7539]:  notice: Initiating monitor operation
> prm_xen_test-jeos1_monitor_600000 on h18
> 
> h19 pacemaker-controld[7539]:  notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE
> (no more messages for a while)
> 
> On the remote nodes I see:
> h18 pacemaker-controld[7131]:  notice: Result of migrate_from
> operation for prm_xen_test-jeos4 on h18: ok
> h18 pacemaker-controld[7131]:  notice: Result of migrate_from
> operation for prm_xen_test-jeos1 on h18: ok
> 
> h16 pacemaker-controld[7223]:  notice: Result of migrate_to operation
> for prm_xen_test-jeos4 on h16: ok
> h16 pacemaker-controld[7223]:  notice: Result of migrate_to operation
> for prm_xen_test-jeos1 on h16: ok
> 
> The other thing I noticed is that a start operation is logged with
> execution time like this:
> h19 pacemaker-execd[7536]:  notice: prm_cron_snap_test-jeos2 start
> (call 170, PID 11079) exited with status 0 (execution time 55ms,
> queue time 0ms)
> 
> But for a migration there is no such timing info (logged by the DC;
> it's logged on the remote node). Considering that migrations can 

Same situation -- it's the node that executes the action that knows how
long it took, so that's where the log is.

> time-out, wouldn't that be a useful info to have as well?
> 
> Regards,
> Ulrich
-- 
Ken Gaillot <kgaillot at redhat.com>