[ClusterLabs] Detecting pacemaker version incompatibility during node rebuild
Madison Kelly
mkelly at alteeve.com
Fri Jun 14 02:57:00 UTC 2024
Hi all,
I'm working on a tool to rebuild a node that was lost. Given this
scenario, upgrading the surviving node is not viable (at least, not
until after the rebuild is completed and the services can be migrated).
I ran into a problem where 'pcs cluster start' exits with RC 0, and
it _looks_ like the cluster is starting, but then it exits without a
message on STDOUT. In the logs though, I can see this;
====
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]:
notice: Node an-a01n01 state is now member
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]: error:
Local feature set (3.17.4) is incompatible with DC's (3.19.0)
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]:
notice: Forcing immediate exit with status 100 (Fatal error occurred,
will not respawn)
Jun 13 22:35:04 an-a01n02.alteeve.com pacemaker-controld[105161]:
warning: Inhibiting respawn
====
So I have two questions;
1. Is there a way to test (using pcs or another tool) to see if the
local machine is compatible with the peer?
2. If the node being rebuilt isn't compatible, is there a way to tell it
to start in a compatibility mode, or to tell the surviving peer to
switch to a compatibility mode? Which depending on which is newer.
Of course, in this particular test case, the node being rebuilt is
behind the survivor, so the fix here is a simple update of pacemaker
before rejoining. However in the real world, it's far more likely that
the node being joined will be a newer version.
The reason for this is that a large number of our deployments are in
location with no or limited internet. So keeping the active cluster
regularly updated is not feasible (and some clients "lock" their
deployments to approved/tested versions).
Thanks for any hints/tips!
Madi
--
wiki - https://alteeve.com/w
cell - 647-471-0951
work - 647-417-7486 x 404
More information about the Users
mailing list