[ClusterLabs] Corosync ifdown / ifup crash with Assertion `token_memb_entries >= 1' failed.

Jan Friesse jfriesse at redhat.com
Tue Jun 2 06:57:12 EDT 2015


Alexander,

Alexander T napsal(a):
> Jan,
> 
> Thank you for the explanation. What do you mean by "knet migration" in
> Corosync 3.0?

We are planning to replace current network related code with knet.

Basically currently we have multicast, udpu and IBA (infiniband)
transport. Problem with multicast is mainly in badly configured
switches, because many network admins believe multicast is evil + some
switches are really bad in multicast. IBA code is almost unmaintained.
So last is UDPU. UDPU works quite well, but still, it's just 1:1 rewrite
of multicast where multicast is simulated by sending messages to all
configured members. Still all weird requirements and weird code base
remains.

Last but not least is RRP. RRP itself works very well sadly it works in
totally different way then most of people expects.

Solution is knet (http://www.kronosnet.org/). Knet should give corosync
proper support for multiple NICs (like software bonding), very fast
reaction time to link failure, proper handling of MTU, ...

Regards,
  Honza


> 
> Best regards,
> 
> Alexander
> 
> PS. Your signature led me to http://en.wikipedia.org/wiki/Honza which was
> an interresting read! DS.
> 
> On Tue, Jun 2, 2015 at 11:26 AM, Jan Friesse <jfriesse at redhat.com> wrote:
> 
>> Alexander,
>>
>> Alexander T napsal(a):
>>> Hello everyone!
>>>
>>> I was wondering if the project is aware of the bug at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=989934 and if it has been
>> fixed
>>> in recent releases.
>>
>> Yes, we are aware of this bug and no it's not fixed. Fixing this bug is
>> way harder then it looks like and workaround (just don't do ifdown) is
>> simple. It will be fixed in Corosync 3.0 because of knet migration and I
>> still hope I will be also able to fix 2.x and 1.x, but not in the near
>> future.
>>
>>>
>>> It leaves pacemaker in a half-crashed state and it was non-trivial to
>> track
>>> down in my case. If anyone could explain how this error occurs it would
>> be
>>> beneficial for me, since I need to explain to a customer the reason why
>>> ifdown/up isn't working (and is unsupported).
>>
>> Technically it's mix of weird requirements (where most of them are
>> totally irrelevant today) and weird implementation.
>>
>>>
>>> The traceback in my case is:
>>>
>>> #0  0x00007f9f65a31925 in raise (sig=6) at
>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>>> 64  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>>> (gdb) bt
>>> #0  0x00007f9f65a31925 in raise (sig=6) at
>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>>> #1  0x00007f9f65a33105 in abort () at abort.c:92
>>> #2  0x00007f9f65a2aa4e in __assert_fail_base (fmt=<value optimized out>,
>>> assertion=0x7f9f667f05b5 "token_memb_entries >= 1", file=0x7f9f667f0559
>>> "totemsrp.c",
>>>     line=<value optimized out>, function=<value optimized out>) at
>>> assert.c:96
>>> #3  0x00007f9f65a2ab10 in __assert_fail (assertion=0x7f9f667f05b5
>>> "token_memb_entries >= 1", file=0x7f9f667f0559 "totemsrp.c", line=1235,
>>>     function=0x7f9f667f09b0 "memb_consensus_agreed") at assert.c:105
>>> #4  0x00007f9f667de48c in memb_consensus_agreed (instance=0x7f9f66bbd010)
>>> at totemsrp.c:1235
>>> #5  0x00007f9f667e23df in memb_join_process (instance=0x7f9f66bbd010,
>>> memb_join=0x10fb294) at totemsrp.c:3991
>>> #6  0x00007f9f667e2789 in message_handler_memb_join
>>> (instance=0x7f9f66bbd010, msg=<value optimized out>, msg_len=<value
>>> optimized out>,
>>>     endian_conversion_needed=<value optimized out>) at totemsrp.c:4236
>>> #7  0x00007f9f667dc668 in rrp_deliver_fn (context=<value optimized out>,
>>> msg=0x10fb294, msg_len=333) at totemrrp.c:1747
>>> #8  0x00007f9f667d99da in net_deliver_fn (handle=<value optimized out>,
>>> fd=<value optimized out>, revents=<value optimized out>, data=0x10fabe0)
>> at
>>> totemudpu.c:1153
>>> #9  0x00007f9f667d2352 in poll_run (handle=8184985674965843968) at
>>> coropoll.c:513
>>> #10 0x0000000000407966 in main (argc=<value optimized out>, argv=<value
>>> optimized out>, envp=<value optimized out>) at main.c:2043
>>>
>>> Best regards
>>>
>>> Alexander Torstling
>>>
>>>
>>
>> Regards,
>>   Honza
>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Users mailing list