[ClusterLabs] Upgrade corosync problem

Fri Jun 22 09:14:52 UTC 2018

Hi Christine,

Thanks for reply. Let me add few details. When I run the corosync service I se the corosync process running. If I stop it and run:

corosync -f 

I see three warnings:
warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
warning [MAIN  ] Please migrate config file to nodelist.
warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
warning [MAIN  ] Could not set priority -2147483648: Permission denied (13)

but I see node joined.

My corosync.conf file is below.

With service corosync up and running I have the following output:
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
	id	= 10.0.0.11
	status	= ring 0 active with no faults
RING ID 1
	id	= 192.168.0.11
	status	= ring 1 active with no faults

corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.0.11) r(1) ip(192.168.0.11) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.0.12) r(1) ip(192.168.0.12) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

For the moment I have two nodes in my cluster (third node and some issues and at the moment I did crm node standby on it).

Here the dependency I have installed for corosync (that works fine with pacemaker 1.1.14 and corosync 2.3.5):
     libnspr4-dev_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb
     libnspr4_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb
     libnss3-dev_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb
     libnss3-nssdb_2%253a3.19.2.1-0ubuntu0.14.04.2_all.deb
     libnss3_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb
     libqb-dev_0.16.0.real-1ubuntu4_amd64.deb
     libqb0_0.16.0.real-1ubuntu4_amd64.deb

corosync.conf
---------------------
quorum {
        provider: corosync_votequorum
        expected_votes: 3
}
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        rrp_mode: passive
        interface {
                ringnumber: 0
                bindnetaddr: 10.0.0.0
                mcastport: 5405
                ttl: 1
        }
        interface {
                ringnumber: 1
                bindnetaddr: 192.168.0.0
                mcastport: 5405
                ttl: 1
        }
        transport: udpu
        max_network_delay: 100
        retransmits_before_loss_const: 25
        window_size: 150
}
nodelist {
        node {
                ring0_addr: pg1
                ring1_addr: pg1p
                nodeid: 1
        }
        node {
                ring0_addr: pg2
                ring1_addr: pg2p
                nodeid: 2
        }
        node {
                ring0_addr: pg3
                ring1_addr: pg3p
                nodeid: 3
        }
}
logging {
        to_syslog: yes
}

> On 22 Jun 2018, at 09:24, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 21/06/18 16:16, Salvatore D'angelo wrote:
>> Hi,
>> 
>> I upgraded my PostgreSQL/Pacemaker cluster with these versions.
>> Pacemaker 1.1.14 -> 1.1.18
>> Corosync 2.3.5 -> 2.4.4
>> Crmsh 2.2.0 -> 3.0.1
>> Resource agents 3.9.7 -> 4.1.1
>> 
>> I started on a first node  (I am trying one node at a time upgrade).
>> On a PostgreSQL slave node  I did:
>> 
>> *crm node standby <node>*
>> *service pacemaker stop*
>> *service corosync stop*
>> 
>> Then I build the tool above as described on their GitHub.com
>> <http://GitHub.com <http://github.com/>> page. 
>> 
>> *./autogen.sh (where required)*
>> *./configure*
>> *make (where required)*
>> *make install*
>> 
>> Everything went ok. I expect new file overwrite old one. I left the
>> dependency I had with old software because I noticed the .configure
>> didn’t complain. 
>> I started corosync.
>> 
>> *service corosync start*
>> 
>> To verify corosync work properly I used the following commands:
>> *corosync-cfg-tool -s*
>> *corosync-cmapctl | grep members*
>> 
>> Everything seemed ok and I verified my node joined the cluster (at least
>> this is my impression).
>> 
>> Here I verified a problem. Doing the command:
>> corosync-quorumtool -ps
>> 
>> I got the following problem:
>> Cannot initialise CFG service
>> 
> That says that corosync is not running. Have a look in the log files to
> see why it stopped. The pacemaker logs below are showing the same thing,
> but we can't make any more guesses until we see what corosync itself is
> doing. Enabling debug in corosync.conf will also help if more detail is
> needed.
> 
> Also starting corosync with 'corosync -pf' on the command-line is often
> a quick way of checking things are starting OK.
> 
> Chrissie
> 
> 
>> If I try to start pacemaker, I only see pacemaker process running and
>> pacemaker.log containing the following lines:
>> 
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: crm_log_init:Changed
>> active directory to /var/lib/pacemaker/cores/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:     info:
>> get_cluster_type:Detected an active 'corosync' cluster/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:     info:
>> mcp_read_config:Reading configure for stack: corosync/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:   notice: main:Starting
>> Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc
>> lha-fencing nagios  corosync-native atomic-attrd acls/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:     info: main:Maximum core
>> file size is: 18446744073709551615/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd:     info:
>> qb_ipcs_us_publish:server name: pacemakerd/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:  warning:
>> corosync_node_name:Could not connect to Cluster Configuration Database
>> API, error CS_ERR_TRY_AGAIN/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info:
>> corosync_node_name:Unable to get node name for nodeid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:   notice: get_node_name:Could
>> not obtain a node name for corosync nodeid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_get_peer:Created
>> entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node
>> (null)/1 (1 total)/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: crm_get_peer:Node 1
>> has uuid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info:
>> crm_update_peer_proc:cluster_connect_cpg: Node (null)[1] - corosync-cpg
>> is now online/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:    error:
>> cluster_connect_quorum:Could not connect to the Quorum API: 2/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info:
>> qb_ipcs_us_withdraw:withdrawing server sockets/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info: main:Exiting pacemakerd/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd:     info:
>> crm_xml_cleanup:Cleaning up memory from libxml2/
>> 
>> *What is wrong in my procedure?*
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>> 
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180622/97a457b1/attachment-0001.html>