[ClusterLabs] node utilization attributes are lost during upgrade

Ken Gaillot kgaillot at redhat.com
Mon Aug 17 16:38:09 EDT 2020


On Mon, 2020-08-17 at 12:12 +0200, Kadlecsik József wrote:
> Hello,
> 
> At upgrading a corosync/pacemaker/libvirt/KVM cluster from Debian
> stretch 
> to buster, all the node utilization attributes were erased from the 
> configuration. However, the same attributes were kept at the
> VirtualDomain 
> resources. This resulted that all resources with utilization
> attributes 
> were stopped.

Ouch :(

There are two types of node attributes, transient and permanent.
Transient attributes last only until pacemaker is next stopped on the
node, while permanent attributes persist between reboots/restarts.

If you configured the utilization attributes with crm_attribute -z/
--utilization, it will default to permanent, but it's possible to
override that with -l/--lifetime reboot (or equivalently, -t/--type
status).

Permanent node attributes should definitely not be erased in an
upgrade.

> 
> The documentation says: "You can name utilization attributes
> according to 
> your preferences and define as many name/value pairs as your
> configuration 
> needs.", so one assumes utilization attributes are kept during
> upgrades, 
> for nodes and resources as well.
> 
> The corosync incompatibility made the upgrade more stressful anyway
> and 
> the stopping of the resources came out of the blue. The resources
> could 
> not be started of course - and there were no log warning/error
> messages 
> that the resources are not started because the utilization
> constrains 
> could not be satisfied. Pacemaker logs a lot (from admin point of
> view it 
> is too much), but in this case there was no indication why the
> resources 
> could not be started (or we were unable to find it in the logs?). So
> we 
> wasted a lot of time with debugging the VirtualDomain agent.
> 
> Currently we run the cluster with the placement-strategy set to
> default.
> 
> In my opinion node attributes should be kept and preserved during an 
> upgrade. Also, it should be logged when a resource must be
> stopped/cannot 
> be started because the utilization constrains cannot be satisfied.
> 
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.hu
> PGP key: https://wigner.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics
>          H-1525 Budapest 114, POB. 49, Hungary
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list