[ClusterLabs] lrmd segfault

Mon Feb 6 12:07:48 EST 2017

On 02/06/2017 05:47 AM, cys wrote:
> Hi All.
> 
> Recently we got a lrmd coredump. It occured only once and  we don't know how to reproduce it.
> The version we use is pacemaker-1.1.15-11. Ths os is centos 7.
> 
> Core was generated by `/usr/libexec/pacemaker/lrmd'.
> Program terminated with signal 11, Segmentation fault.
> #0  __strcasecmp_l_avx () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> 164             movdqu  (%rdi), %xmm1
> (gdb) bt
> #0  __strcasecmp_l_avx () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> #1  0x00007fd6d554a53c in crm_str_eq (a=<optimized out>, b=b at entry=0x7fd6d6d42800 "p_vip", use_case=use_case at entry=0) at utils.c:1454
> #2  0x00007fd6d5322baa in is_op_blocked (rsc=0x7fd6d6d42800 "p_vip") at services.c:653
> #3  0x00007fd6d5322ca5 in services_action_async (op=0x7fd6d6d5f8d0, action_callback=<optimized out>) at services.c:634
> #4  0x00007fd6d59af67c in lrmd_rsc_execute_service_lib (cmd=0x7fd6d6d69bd0, rsc=0x7fd6d6d5d6f0) at lrmd.c:1242
> #5  lrmd_rsc_execute (rsc=0x7fd6d6d5d6f0) at lrmd.c:1308
> #6  lrmd_rsc_dispatch (user_data=0x7fd6d6d5d6f0, user_data at entry=<error reading variable: value has been optimized out>) at lrmd
> #7  0x00007fd6d55699f6 in crm_trigger_dispatch (source=0x7fd6d6d59190, callback=<optimized out>, userdata=<optimized out>) at mainloop.c:107
> #8  0x00007fd6d29757aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #9  0x00007fd6d2975af8 in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
> #10 0x00007fd6d2975dca in g_main_loop_run () from /lib64/libglib-2.0.so.0
> #11 0x00007fd6d59ad3ad in main (argc=<optimized out>, argv=0x7fff4bd0def8) at main.c:476
> (gdb) p inflight_ops->data
> $4 = (gpointer) 0x7fd6d6d605c0
> (gdb) x/10xg 0x7fd6d6d605c0
> 0x7fd6d6d605c0: 0x0000000000000000     0x0000000300000002
> 0x7fd6d6d605d0: 0x0000000200000004      0x0000000000000005
> 0x7fd6d6d605e0: 0x0000000000000008      0x0000000000000000
> 0x7fd6d6d605f0: 0x0000000d00000000      0x0000000f0000000e
> 0x7fd6d6d60600: 0x0000000100000001      0x0000001300000000
> 
> The memory at inflight_ops->data is not a valid svc_action_t object.
> 
> I saw a similar problem at http://lists.clusterlabs.org/pipermail/users/2017-January/004906.html.
> But it said the problem has gone in 1.1.15.
> 
> Any help would be appreciated.

That's odd -- it does look like the issue fixed by commits 67d68df and
786ebc4 in 1.1.15.

Are you absolutely sure that the node with the issue has 1.1.15, and
that pacemaker has been restarted since 1.1.15 was deployed?

If so, you may want to open an issue at bugs.clusterlabs.org and attach
the output of crm_report covering the time when the problem occurred.