[ClusterLabs] cib state is now lost

David Neudorfer david.neudorfer at warbyparker.com
Wed Aug 12 10:29:17 UTC 2015


Thanks Ken,

We're currently using Pacemaker 1.1.11 and at the moment its not an option
to upgrade.
I've spun up and down these boxes on AWS and even tried different sizes. I
think a recent upgrade broke this deploy.

This is the output from dmesg:

cib[16656] general protection ip:7f45391e9545 sp:7ffddf16c8b8 error:0 in
libc-2.12.so[7f45390be000+18a000]
cib[16659] general protection ip:7fa36fa89545 sp:7ffe28416288 error:0 in
libc-2.12.so[7fa36f95e000+18a000]
cib[16663] general protection ip:7fa3defce545 sp:7ffeb5b29c58 error:0 in
libc-2.12.so[7fa3deea3000+18a000]
cib[16666] general protection ip:7fa1cefe4545 sp:7ffcc4b9c778 error:0 in
libc-2.12.so[7fa1ceeb9000+18a000]
cib[16669] general protection ip:7f4b3900f545 sp:7ffdcd65aaf8 error:0 in
libc-2.12.so[7f4b38ee4000+18a000]
cib[16672] general protection ip:7fc38be2b545 sp:7fffbc7e1598 error:0 in
libc-2.12.so[7fc38bd00000+18a000]
cib[16675] general protection ip:7f9c6890c545 sp:7ffca09539f8 error:0 in
libc-2.12.so[7f9c687e1000+18a000]
cib[16678] general protection ip:7f1c636ad545 sp:7ffc677d2008 error:0 in
libc-2.12.so[7f1c63582000+18a000]
cib[16681] general protection ip:7fed0b47e545 sp:7ffd051f0618 error:0 in
libc-2.12.so[7fed0b353000+18a000]
cib[16684] general protection ip:7f2ee87cd545 sp:7fff8d9ae288 error:0 in
libc-2.12.so[7f2ee86a2000+18a000]
cib[16687] general protection ip:7f41c3789545 sp:7fff9f005848 error:0 in
libc-2.12.so[7f41c365e000+18a000]



On Mon, Aug 10, 2015 at 9:54 AM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 08/09/2015 02:27 PM, David Neudorfer wrote:
> > Where can I dig deeper to figure out why cib keeps terminating? selinux
> and
> > iptables are both disabled and I've have debug enabled. Google hasn't
> been
> > able to help me thus far.
> >
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:    debug:
> > get_local_nodeid:     Local nodeid is 84939948
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > plugin_get_details:   Server details: id=84939948 uname=ip-172-20-16-5
> > cname=pcmk
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_get_peer:         Created entry
> > c1f204b2-c994-48d9-81b6-87e1a7fc1ee7/0xa2c460 for node
> > ip-172-20-16-5/84939948 (1 total)
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_get_peer:         Node 84939948 is now known as ip-172-20-16-5
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_get_peer:         Node 84939948 has uuid ip-172-20-16-5
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_update_peer_proc:         init_cs_connection_classic: Node
> > ip-172-20-16-5[84939948] - unknown is now online
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > init_cs_connection_once:      Connection to 'classic openais (with
> > plugin)': established
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > get_node_name:        Defaulting to uname -n for the local classic
> openais
> > (with plugin) node name
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > qb_ipcs_us_publish:   server name: cib_ro
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > qb_ipcs_us_publish:   server name: cib_rw
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > qb_ipcs_us_publish:   server name: cib_shm
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info: cib_init:
> >       Starting cib mainloop
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > plugin_handle_membership:     Membership 104: quorum acquired
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_update_peer_proc:         plugin_handle_membership: Node
> > ip-172-20-16-5[84939948] - unknown is now member
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > crm_update_peer_state:        cib_peer_update_callback: Node
> > ip-172-20-16-5[84939948] - state is now lost (was (null))
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > crm_reap_dead_member:         Removing ip-172-20-16-5/84939948 from the
> > membership list
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > reap_crm_member:      Purged 1 peers with id=84939948 and/or uname=(null)
> > from the membership cache
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:   notice:
> > crm_update_peer_state:        plugin_handle_membership: Node
> ��[2077843320]
> > - state is now member (was member)
> > Aug 09 18:54:29 [12526] ip-172-20-16-5        cib:     info:
> > crm_update_peer:      plugin_handle_membership: Node ��: id=2077843320
> > state=r(0) ip(172.20.16.5)  addr=r(0) ip(172.20.16.5)  (new) votes=1
> > (new) born=104 seen=104 proc=00000000000000000000000000111312
>
> The unprintable characters strongly implies memory corruption. There are
> known issues with that when using the legacy plugin with some versions
> of pacemaker. What version are you using? If you are compiling yourself,
> I would recommend using the current upstream master branch (not 1.1.13,
> which has the issue).
>
> An even better solution would be to switch to corosync 2 instead of the
> plugin, as corosync 2 gets more development and testing these days.
>
> >
> > https://gist.github.com/davidneudorfer/bc97082a9d9dfb12985b
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 

David Neudorfer

Automation Engineer

WARBY PARKER
<http://www.google.com/url?q=http%3A%2F%2Fwww.warbyparker.com%2F&sa=D&sntz=1&usg=AFrqEzfkTF4rhdwjlARqomkRV4TIc34NxA>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150812/c774606a/attachment.htm>


More information about the Users mailing list