[ClusterLabs Developers] Delayed node reboot

Tue Sep 5 10:18:55 EDT 2023

On Fri, 2023-09-01 at 08:58 -0500, dennis.r.lacroix--- via Developers
wrote:
> I am working on an application where I need to reboot individual
> nodes in the cluster as quickly as possible.  In order to do so, I am
> first putting the node into standby mode and then shutting down
> Pacemaker.  This works well in most cases - except when the node I am
> shutting down is the DC and has a Master resources that need to be
> promoted on another node.  In this case, the Pacemaker shutdown seems
> to be delayed until both the local resources are stopped AND the
> remote resources are promoted and/or started.  This causes an
> unacceptable delay in the reboot of the node.
> 
> Am I correctly interpreting why the Pacemaker shutdown is taking so
> long?  Is there any way to fix this?  The idea solution would seem to
> be to force the DC to another node before putting the node into
> standby mode, but there doesn’t seem to be a mechanism to do so.  Is
> there another way to deal with this?

Correct, the DC will finish its last transition before shutting down.

There is no way to affect the DC election currently. A long time ago
there was a custom code modification that showed it's feasible but it's
not on the road map now.

> To summarize: I basically need to cleanly shutdown the resources on
> the node and exit the cluster as quickly as possible so that I can
> reboot the node and rejoin the cluster with minimal delay.
> 
> Any suggestions are appreciated!
> Dennis LaCroix
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/developers
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>