[ClusterLabs] corosync dead loop in segfault handler

Thu Mar 9 11:25:59 CET 2017

On 08/03/17 11:04, cys wrote:
> At 2017-02-21 00:24:33, "Christine Caulfield" <ccaulfie at redhat.com> wrote:
>> Thanks, I can read that core now. It's something odd happening in the
>> sync() code that I can't quite diagnose without the blackbox. We've only
>> ever seen crashes like that when there's been network corruption or
>> on-wire incompatibilities. Has it happened before?
>>
>> Chrissie
>>
> 
> We caught another infloop today. Here is the blackbox in attachment.
> 

Thanks. Oddly that looks like a totally different incident to the core
file we had last time. That seemed to be in a node state transition
whereas this is in stable running. The last thing to happen was an IPC
connection which indicates that libqb might be possibly involved. I
recently identified a bug in libqb that's triggered by using it for
multithreaded IPC access, but the only Red Hat software that does that
is clvmd and the use pattern in the black box output is not clvmd. So
unless you have some custom-written multi-threaded software that uses
libcmap extensively (do you?) then I'm none-the-wiser I'm afraid :/

Chrissie