Kadlecsik József kadlecsik.jozsef at wigner.hu
Mon Aug 17 06:12:36 EDT 2020


At upgrading a corosync/pacemaker/libvirt/KVM cluster from Debian stretch 
to buster, all the node utilization attributes were erased from the 
configuration. However, the same attributes were kept at the VirtualDomain 
resources. This resulted that all resources with utilization attributes 
were stopped.

The documentation says: "You can name utilization attributes according to 
your preferences and define as many name/value pairs as your configuration 
needs.", so one assumes utilization attributes are kept during upgrades, 
for nodes and resources as well.

The corosync incompatibility made the upgrade more stressful anyway and 
the stopping of the resources came out of the blue. The resources could 
not be started of course - and there were no log warning/error messages 
that the resources are not started because the utilization constrains 
could not be satisfied. Pacemaker logs a lot (from admin point of view it 
is too much), but in this case there was no indication why the resources 
could not be started (or we were unable to find it in the logs?). So we 
wasted a lot of time with debugging the VirtualDomain agent.

Currently we run the cluster with the placement-strategy set to default.

In my opinion node attributes should be kept and preserved during an 
upgrade. Also, it should be logged when a resource must be stopped/cannot 
be started because the utilization constrains cannot be satisfied.

