[ClusterLabs] cib state is now lost

Ken Gaillot kgaillot at redhat.com
Mon Aug 10 09:54:54 EDT 2015


On 08/09/2015 02:27 PM, David Neudorfer wrote:
> Where can I dig deeper to figure out why cib keeps terminating? selinux and
> iptables are both disabled and I've have debug enabled. Google hasn't been
> able to help me thus far.
> 
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:    debug:
> get_local_nodeid: 	Local nodeid is 84939948
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> plugin_get_details: 	Server details: id=84939948 uname=ip-172-20-16-5
> cname=pcmk
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_get_peer: 	Created entry
> c1f204b2-c994-48d9-81b6-87e1a7fc1ee7/0xa2c460 for node
> ip-172-20-16-5/84939948 (1 total)
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_get_peer: 	Node 84939948 is now known as ip-172-20-16-5
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_get_peer: 	Node 84939948 has uuid ip-172-20-16-5
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_update_peer_proc: 	init_cs_connection_classic: Node
> ip-172-20-16-5[84939948] - unknown is now online
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> init_cs_connection_once: 	Connection to 'classic openais (with
> plugin)': established
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> get_node_name: 	Defaulting to uname -n for the local classic openais
> (with plugin) node name
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> qb_ipcs_us_publish: 	server name: cib_ro
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> qb_ipcs_us_publish: 	server name: cib_rw
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> qb_ipcs_us_publish: 	server name: cib_shm
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info: cib_init:
> 	Starting cib mainloop
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> plugin_handle_membership: 	Membership 104: quorum acquired
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_update_peer_proc: 	plugin_handle_membership: Node
> ip-172-20-16-5[84939948] - unknown is now member
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> crm_update_peer_state: 	cib_peer_update_callback: Node
> ip-172-20-16-5[84939948] - state is now lost (was (null))
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> crm_reap_dead_member: 	Removing ip-172-20-16-5/84939948 from the
> membership list
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> reap_crm_member: 	Purged 1 peers with id=84939948 and/or uname=(null)
> from the membership cache
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> crm_update_peer_state: 	plugin_handle_membership: Node ��[2077843320]
> - state is now member (was member)
> Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> crm_update_peer: 	plugin_handle_membership: Node ��: id=2077843320
> state=r(0) ip(172.20.16.5)  addr=r(0) ip(172.20.16.5)  (new) votes=1
> (new) born=104 seen=104 proc=00000000000000000000000000111312

The unprintable characters strongly implies memory corruption. There are
known issues with that when using the legacy plugin with some versions
of pacemaker. What version are you using? If you are compiling yourself,
I would recommend using the current upstream master branch (not 1.1.13,
which has the issue).

An even better solution would be to switch to corosync 2 instead of the
plugin, as corosync 2 gets more development and testing these days.

> 
> https://gist.github.com/davidneudorfer/bc97082a9d9dfb12985b




More information about the Users mailing list