[ClusterLabs] node utilization attributes are lost during upgrade
Ken Gaillot
kgaillot at redhat.com
Mon Aug 17 16:38:09 EDT 2020
On Mon, 2020-08-17 at 12:12 +0200, Kadlecsik József wrote:
> Hello,
>
> At upgrading a corosync/pacemaker/libvirt/KVM cluster from Debian
> stretch
> to buster, all the node utilization attributes were erased from the
> configuration. However, the same attributes were kept at the
> VirtualDomain
> resources. This resulted that all resources with utilization
> attributes
> were stopped.
Ouch :(
There are two types of node attributes, transient and permanent.
Transient attributes last only until pacemaker is next stopped on the
node, while permanent attributes persist between reboots/restarts.
If you configured the utilization attributes with crm_attribute -z/
--utilization, it will default to permanent, but it's possible to
override that with -l/--lifetime reboot (or equivalently, -t/--type
status).
Permanent node attributes should definitely not be erased in an
upgrade.
>
> The documentation says: "You can name utilization attributes
> according to
> your preferences and define as many name/value pairs as your
> configuration
> needs.", so one assumes utilization attributes are kept during
> upgrades,
> for nodes and resources as well.
>
> The corosync incompatibility made the upgrade more stressful anyway
> and
> the stopping of the resources came out of the blue. The resources
> could
> not be started of course - and there were no log warning/error
> messages
> that the resources are not started because the utilization
> constrains
> could not be satisfied. Pacemaker logs a lot (from admin point of
> view it
> is too much), but in this case there was no indication why the
> resources
> could not be started (or we were unable to find it in the logs?). So
> we
> wasted a lot of time with debugging the VirtualDomain agent.
>
> Currently we run the cluster with the placement-strategy set to
> default.
>
> In my opinion node attributes should be kept and preserved during an
> upgrade. Also, it should be logged when a resource must be
> stopped/cannot
> be started because the utilization constrains cannot be satisfied.
>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.hu
> PGP key: https://wigner.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics
> H-1525 Budapest 114, POB. 49, Hungary
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list