[ClusterLabs] Upgrade corosync problem
Salvatore D'angelo
sasadangelo at gmail.com
Fri Jun 22 05:14:52 EDT 2018
Hi Christine,
Thanks for reply. Let me add few details. When I run the corosync service I se the corosync process running. If I stop it and run:
corosync -f
I see three warnings:
warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
warning [MAIN ] Please migrate config file to nodelist.
warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
but I see node joined.
My corosync.conf file is below.
With service corosync up and running I have the following output:
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 10.0.0.11
status = ring 0 active with no faults
RING ID 1
id = 192.168.0.11
status = ring 1 active with no faults
corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.0.11) r(1) ip(192.168.0.11)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.0.12) r(1) ip(192.168.0.12)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
For the moment I have two nodes in my cluster (third node and some issues and at the moment I did crm node standby on it).
Here the dependency I have installed for corosync (that works fine with pacemaker 1.1.14 and corosync 2.3.5):
libnspr4-dev_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb
libnspr4_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb
libnss3-dev_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb
libnss3-nssdb_2%253a3.19.2.1-0ubuntu0.14.04.2_all.deb
libnss3_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb
libqb-dev_0.16.0.real-1ubuntu4_amd64.deb
libqb0_0.16.0.real-1ubuntu4_amd64.deb
corosync.conf
---------------------
quorum {
provider: corosync_votequorum
expected_votes: 3
}
totem {
version: 2
crypto_cipher: none
crypto_hash: none
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 192.168.0.0
mcastport: 5405
ttl: 1
}
transport: udpu
max_network_delay: 100
retransmits_before_loss_const: 25
window_size: 150
}
nodelist {
node {
ring0_addr: pg1
ring1_addr: pg1p
nodeid: 1
}
node {
ring0_addr: pg2
ring1_addr: pg2p
nodeid: 2
}
node {
ring0_addr: pg3
ring1_addr: pg3p
nodeid: 3
}
}
logging {
to_syslog: yes
}
> On 22 Jun 2018, at 09:24, Christine Caulfield <ccaulfie at redhat.com> wrote:
>
> On 21/06/18 16:16, Salvatore D'angelo wrote:
>> Hi,
>>
>> I upgraded my PostgreSQL/Pacemaker cluster with these versions.
>> Pacemaker 1.1.14 -> 1.1.18
>> Corosync 2.3.5 -> 2.4.4
>> Crmsh 2.2.0 -> 3.0.1
>> Resource agents 3.9.7 -> 4.1.1
>>
>> I started on a first node (I am trying one node at a time upgrade).
>> On a PostgreSQL slave node I did:
>>
>> *crm node standby <node>*
>> *service pacemaker stop*
>> *service corosync stop*
>>
>> Then I build the tool above as described on their GitHub.com
>> <http://GitHub.com <http://github.com/>> page.
>>
>> *./autogen.sh (where required)*
>> *./configure*
>> *make (where required)*
>> *make install*
>>
>> Everything went ok. I expect new file overwrite old one. I left the
>> dependency I had with old software because I noticed the .configure
>> didn’t complain.
>> I started corosync.
>>
>> *service corosync start*
>>
>> To verify corosync work properly I used the following commands:
>> *corosync-cfg-tool -s*
>> *corosync-cmapctl | grep members*
>>
>> Everything seemed ok and I verified my node joined the cluster (at least
>> this is my impression).
>>
>> Here I verified a problem. Doing the command:
>> corosync-quorumtool -ps
>>
>> I got the following problem:
>> Cannot initialise CFG service
>>
> That says that corosync is not running. Have a look in the log files to
> see why it stopped. The pacemaker logs below are showing the same thing,
> but we can't make any more guesses until we see what corosync itself is
> doing. Enabling debug in corosync.conf will also help if more detail is
> needed.
>
> Also starting corosync with 'corosync -pf' on the command-line is often
> a quick way of checking things are starting OK.
>
> Chrissie
>
>
>> If I try to start pacemaker, I only see pacemaker process running and
>> pacemaker.log containing the following lines:
>>
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init:Changed
>> active directory to /var/lib/pacemaker/cores/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info:
>> get_cluster_type:Detected an active 'corosync' cluster/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info:
>> mcp_read_config:Reading configure for stack: corosync/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main:Starting
>> Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc
>> lha-fencing nagios corosync-native atomic-attrd acls/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main:Maximum core
>> file size is: 18446744073709551615/
>> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info:
>> qb_ipcs_us_publish:server name: pacemakerd/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: warning:
>> corosync_node_name:Could not connect to Cluster Configuration Database
>> API, error CS_ERR_TRY_AGAIN/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info:
>> corosync_node_name:Unable to get node name for nodeid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name:Could
>> not obtain a node name for corosync nodeid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Created
>> entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node
>> (null)/1 (1 total)/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Node 1
>> has uuid 1/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info:
>> crm_update_peer_proc:cluster_connect_cpg: Node (null)[1] - corosync-cpg
>> is now online/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: error:
>> cluster_connect_quorum:Could not connect to the Quorum API: 2/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info:
>> qb_ipcs_us_withdraw:withdrawing server sockets/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main:Exiting pacemakerd/
>> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info:
>> crm_xml_cleanup:Cleaning up memory from libxml2/
>>
>> *What is wrong in my procedure?*
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180622/97a457b1/attachment-0002.html>
More information about the Users
mailing list