[ClusterLabs] corosync won't start after node failure
Murat Inal
mrt_nl at hotmail.com
Mon Aug 19 09:58:09 UTC 2024
[Resending the below due to message format problem]
Dear List,
I have been running two different 3-node clusters for some time. I am
having a fatal problem with corosync: After a node failure, rebooted
node does NOT start corosync.
Clusters;
* All nodes are running Ubuntu Server 24.04
* corosync is 3.1.7
* corosync-qdevice is 3.0.3
* pacemaker is 2.1.6
* The third node at both clusters is a quorum device. Cluster is on
ffsplit algorithm.
* All nodes are baremetal & attached to a dedicated kronosnet network.
* STONITH is enabled in one of the clusters and disabled for the other.
corosync & pacemaker service starts (systemd) are disabled. I am
starting any cluster with the command pcs cluster start.
corosync NEVER starts AFTER a node failure (node is rebooted). There is
nothing in /var/log/corosync/corosync.log, service freezes as:
Aug 01 12:54:56 [3193] charon corosync notice [MAIN ] Corosync Cluster
Engine 3.1.7 starting up
Aug 01 12:54:56 [3193] charon corosync info [MAIN ] Corosync
built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim
nozzle snmp pie relro bindnow
corosync never starts kronosnet. I checked kronosnet interfaces, all OK,
there is IP connectivity in between. If I do corosync -t, it is the same
freeze.
I could ONLY manage to start corosync by reinstalling it: apt reinstall
corosync ; pcs cluster start.
The above issue repeated itself at least 5-6 times. I do NOT see
anything in syslog either. I will be glad if you lead me on how to solve
this.
Thanks,
Murat
More information about the Users
mailing list