[ClusterLabs] corosync won't start after node failure

Murat Inal mrt_nl at hotmail.com
Mon Aug 19 09:58:09 UTC 2024


[Resending the below due to message format problem]


Dear List,

I have been running two different 3-node clusters for some time. I am 
having a fatal problem with corosync: After a node failure, rebooted 
node does NOT start corosync.

Clusters;

  * All nodes are running Ubuntu Server 24.04
  * corosync is 3.1.7
  * corosync-qdevice is 3.0.3
  * pacemaker is 2.1.6
  * The third node at both clusters is a quorum device. Cluster is on
    ffsplit algorithm.
  * All nodes are baremetal & attached to a dedicated kronosnet network.
  * STONITH is enabled in one of the clusters and disabled for the other.

corosync & pacemaker service starts (systemd) are disabled. I am 
starting any cluster with the command pcs cluster start.

corosync NEVER starts AFTER a node failure (node is rebooted). There is 
nothing in /var/log/corosync/corosync.log, service freezes as:

Aug 01 12:54:56 [3193] charon corosync notice  [MAIN  ] Corosync Cluster 
Engine 3.1.7 starting up
Aug 01 12:54:56 [3193] charon corosync info    [MAIN  ] Corosync 
built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim 
nozzle snmp pie relro bindnow

corosync never starts kronosnet. I checked kronosnet interfaces, all OK, 
there is IP connectivity in between. If I do corosync -t, it is the same 
freeze.

I could ONLY manage to start corosync by reinstalling it: apt reinstall 
corosync ; pcs cluster start.

The above issue repeated itself at least 5-6 times. I do NOT see 
anything in syslog either. I will be glad if you lead me on how to solve 
this.

Thanks,

Murat



More information about the Users mailing list