[ClusterLabs] lrmd segfault

Tue Jan 31 09:25:05 EST 2017

On 01/31/2017 03:12 PM, alexey at kurnosov.spb.ru wrote:
> As i said, we used rpm from standard repo, hardly it compiled incorrectly. And according
> to a spec L5630 (the node's CPU) has SSE4.2 support. And in that case it should be
> illegal instruction exception, not segfault.

... and it is running on bare-metal?
Just to be sure it is not due to some code-patching done by a hypervisor ...

>
> --
> Alexey Kurnosov
>
> On Tue, Jan 31, 2017 at 07:34:18AM +0100, Kristoffer Grönlund wrote:
>> alexey at kurnosov.spb.ru writes:
>>
>>> [ Unknown signature status ]
>>>
>>> Hi All.
>>>
>>> We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 SL7(Scientific linux) and 2 SL6.
>>> SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - 1.1.13-10), SL6 build from sources (same version).
>>> The cluster not unified, some nodes have RA which other do not have. crmsh used for management.
>>> SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the exactly same place.
>>> Here is an example:
>>>
>> Just from looking at the core dump, it looks like your processor doesn't
>> support the SSE extensions used by the newer version of the code. You'll
>> need to recompile and disable use of those extensions.
>>
>> It looks like the code is using SSE 4.2, which is relatively new:
>>
>> https://en.wikipedia.org/wiki/SSE4#SSE4.2
>>
>> Cheers,
>> Kristoffer
>>
>>> Core was generated by `/usr/libexec/pacemaker/lrmd'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>>> 164             movdqu  (%rdi), %xmm1
>>> (gdb) bt
>>> #0  __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>>> #1  0x00007fed076136dc in crm_str_eq (a=<optimized out>, b=b at entry=0xed7070 "DRBD_D16", use_case=use_case at entry=0) at utils.c:1416
>>> #2  0x00007fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at services.c:644
>>> #3  0x00007fed073eac1d in services_action_async (op=0xed58e0, action_callback=<optimized out>) at services.c:625
>>> #4  0x0000000000404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, rsc=0xed4500) at lrmd.c:1242
>>> #5  lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308
>>> #6  lrmd_rsc_dispatch (user_data=0xed4500, user_data at entry=<error reading variable: value has been optimized out>) at lrmd.c:1317
>>> #7  0x00007fed07634c73 in crm_trigger_dispatch (source=0xed54c0, callback=<optimized out>, userdata=<optimized out>) at mainloop.c:107
>>> #8  0x00007fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109
>>> #9  g_main_context_dispatch (context=context at entry=0xeb4d40) at gmain.c:3708
>>> #10 0x00007fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, block=block at entry=1, dispatch=dispatch at entry=1, self=<optimized out>) at gmain.c:3779
>>> #11 0x00007fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973
>>> #12 0x00000000004028ce in main (argc=<optimized out>, argv=0x7ffe9b3b0fd8) at main.c:476
>>>
>>> Any help would be appreciated.
>>>
>>> --
>>> Alexey Kurnosov
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> -- 
>> // Kristoffer Grönlund
>> // kgronlund at suse.com
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org