[ClusterLabs] lrmd segfault

Tue Feb 7 02:07:07 UTC 2017

>
>Are you absolutely sure that the node with the issue has 1.1.15, and
>that pacemaker has been restarted since 1.1.15 was deployed?
>

Yes. We upgraded to 1.1.15 about one month ago.

At 2017-02-07 01:07:48, "Ken Gaillot" <kgaillot at redhat.com> wrote:
>On 02/06/2017 05:47 AM, cys wrote:
>> Hi All.
>> 
>> Recently we got a lrmd coredump. It occured only once and  we don't know how to reproduce it.
>> The version we use is pacemaker-1.1.15-11. Ths os is centos 7.
>> 
>> Core was generated by `/usr/libexec/pacemaker/lrmd'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  __strcasecmp_l_avx () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>> 164             movdqu  (%rdi), %xmm1
>> (gdb) bt
>> #0  __strcasecmp_l_avx () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>> #1  0x00007fd6d554a53c in crm_str_eq (a=<optimized out>, b=b at entry=0x7fd6d6d42800 "p_vip", use_case=use_case at entry=0) at utils.c:1454
>> #2  0x00007fd6d5322baa in is_op_blocked (rsc=0x7fd6d6d42800 "p_vip") at services.c:653
>> #3  0x00007fd6d5322ca5 in services_action_async (op=0x7fd6d6d5f8d0, action_callback=<optimized out>) at services.c:634
>> #4  0x00007fd6d59af67c in lrmd_rsc_execute_service_lib (cmd=0x7fd6d6d69bd0, rsc=0x7fd6d6d5d6f0) at lrmd.c:1242
>> #5  lrmd_rsc_execute (rsc=0x7fd6d6d5d6f0) at lrmd.c:1308
>> #6  lrmd_rsc_dispatch (user_data=0x7fd6d6d5d6f0, user_data at entry=<error reading variable: value has been optimized out>) at lrmd
>> #7  0x00007fd6d55699f6 in crm_trigger_dispatch (source=0x7fd6d6d59190, callback=<optimized out>, userdata=<optimized out>) at mainloop.c:107
>> #8  0x00007fd6d29757aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>> #9  0x00007fd6d2975af8 in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
>> #10 0x00007fd6d2975dca in g_main_loop_run () from /lib64/libglib-2.0.so.0
>> #11 0x00007fd6d59ad3ad in main (argc=<optimized out>, argv=0x7fff4bd0def8) at main.c:476
>> (gdb) p inflight_ops->data
>> $4 = (gpointer) 0x7fd6d6d605c0
>> (gdb) x/10xg 0x7fd6d6d605c0
>> 0x7fd6d6d605c0: 0x0000000000000000     0x0000000300000002
>> 0x7fd6d6d605d0: 0x0000000200000004      0x0000000000000005
>> 0x7fd6d6d605e0: 0x0000000000000008      0x0000000000000000
>> 0x7fd6d6d605f0: 0x0000000d00000000      0x0000000f0000000e
>> 0x7fd6d6d60600: 0x0000000100000001      0x0000001300000000
>> 
>> The memory at inflight_ops->data is not a valid svc_action_t object.
>> 
>> I saw a similar problem at http://lists.clusterlabs.org/pipermail/users/2017-January/004906.html.
>> But it said the problem has gone in 1.1.15.
>> 
>> Any help would be appreciated.
>
>That's odd -- it does look like the issue fixed by commits 67d68df and
>786ebc4 in 1.1.15.
>
>Are you absolutely sure that the node with the issue has 1.1.15, and
>that pacemaker has been restarted since 1.1.15 was deployed?
>
>If so, you may want to open an issue at bugs.clusterlabs.org and attach
>the output of crm_report covering the time when the problem occurred.
>
>
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>http://lists.clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org