[ClusterLabs] Upgrade corosync problem

Thu Jun 21 11:16:29 EDT 2018

Hi,

I upgraded my PostgreSQL/Pacemaker cluster with these versions.
Pacemaker 1.1.14 -> 1.1.18
Corosync 2.3.5 -> 2.4.4
Crmsh 2.2.0 -> 3.0.1
Resource agents 3.9.7 -> 4.1.1

I started on a first node  (I am trying one node at a time upgrade).
On a PostgreSQL slave node  I did:

crm node standby <node>
service pacemaker stop
service corosync stop

Then I build the tool above as described on their GitHub.com page. 

./autogen.sh (where required)
./configure
make (where required)
make install

Everything went ok. I expect new file overwrite old one. I left the dependency I had with old software because I noticed the .configure didn’t complain. 
I started corosync.

service corosync start

To verify corosync work properly I used the following commands:
corosync-cfg-tool -s
corosync-cmapctl | grep members

Everything seemed ok and I verified my node joined the cluster (at least this is my impression).

Here I verified a problem. Doing the command:
corosync-quorumtool -ps

I got the following problem:
Cannot initialise CFG service

If I try to start pacemaker, I only see pacemaker process running and pacemaker.log containing the following lines:

Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: crm_log_init:	Changed active directory to /var/lib/pacemaker/cores
Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: get_cluster_type:	Detected an active 'corosync' cluster
Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: mcp_read_config:	Reading configure for stack: corosync
Jun 21 15:09:38 [17115] pg1 pacemakerd:   notice: main:	Starting Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc lha-fencing nagios  corosync-native atomic-attrd acls
Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: main:	Maximum core file size is: 18446744073709551615
Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: qb_ipcs_us_publish:	server name: pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd:  warning: corosync_node_name:	Could not connect to Cluster Configuration Database API, error CS_ERR_TRY_AGAIN
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: corosync_node_name:	Unable to get node name for nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd:   notice: get_node_name:	Could not obtain a node name for corosync nodeid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_get_peer:	Created entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node (null)/1 (1 total)
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_get_peer:	Node 1 has uuid 1
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_update_peer_proc:	cluster_connect_cpg: Node (null)[1] - corosync-cpg is now online
Jun 21 15:09:53 [17115] pg1 pacemakerd:    error: cluster_connect_quorum:	Could not connect to the Quorum API: 2
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: qb_ipcs_us_withdraw:	withdrawing server sockets
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: main:	Exiting pacemakerd
Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_xml_cleanup:	Cleaning up memory from libxml2

What is wrong in my procedure?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180621/1b888e3e/attachment-0001.html>