[ClusterLabs] Upgrade corosync problem
Salvatore D'angelo
sasadangelo at gmail.com
Thu Jun 21 11:16:29 EDT 2018
Hi,
I upgraded my PostgreSQL/Pacemaker cluster with these versions.
Pacemaker 1.1.14 -> 1.1.18
Corosync 2.3.5 -> 2.4.4
Crmsh 2.2.0 -> 3.0.1
Resource agents 3.9.7 -> 4.1.1
I started on a first node (I am trying one node at a time upgrade).
On a PostgreSQL slave node I did:
crm node standby <node>
service pacemaker stop
service corosync stop
Then I build the tool above as described on their GitHub.com page.
./autogen.sh (where required)
./configure
make (where required)
make install
Everything went ok. I expect new file overwrite old one. I left the dependency I had with old software because I noticed the .configure didn’t complain.
I started corosync.
service corosync start
To verify corosync work properly I used the following commands:
corosync-cfg-tool -s
corosync-cmapctl | grep members
Everything seemed ok and I verified my node joined the cluster (at least this is my impression).
Here I verified a problem. Doing the command:
corosync-quorumtool -ps
I got the following problem:
Cannot initialise CFG service
If I try to start pacemaker, I only see pacemaker process running and pacemaker.log containing the following lines:
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: get_cluster_type: Detected an active 'corosync' cluster
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: mcp_read_config: Reading configure for stack: corosync
Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main: Starting Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc lha-fencing nagios corosync-native atomic-attrd acls
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main: Maximum core file size is: 18446744073709551615
Jun 21 15:09:38 [17115] pg1 pacemakerd: info: qb_ipcs_us_publish: server name: pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd: warning: corosync_node_name: Could not connect to Cluster Configuration Database API, error CS_ERR_TRY_AGAIN
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: corosync_node_name: Unable to get node name for nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name: Could not obtain a node name for corosync nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Created entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node (null)/1 (1 total)
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer: Node 1 has uuid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1] - corosync-cpg is now online
Jun 21 15:09:53 [17115] pg1 pacemakerd: error: cluster_connect_quorum: Could not connect to the Quorum API: 2
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main: Exiting pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2
What is wrong in my procedure?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180621/1b888e3e/attachment-0001.html>
More information about the Users
mailing list