[Pacemaker] lrmadmin -C blocks on subsequent invocations
Andrew Beekhof
andrew at beekhof.net
Fri Nov 26 04:47:06 EST 2010
On Tue, Nov 23, 2010 at 11:36 PM, Dave Williams
<dave at opensourcesolutions.co.uk> wrote:
> On 21:59, Mon 22 Nov 10, Dave Williams wrote:
>> backtrace from gdb shows lrmd to be in a lock_wait
>> #0 0x00007f7e5f8ba6b4 in __lll_lock_wait () from /lib/libpthread.so.0
>> #1 0x00007f7e5f8b5849 in _L_lock_953 () from /lib/libpthread.so.0
>> #2 0x00007f7e5f8b566b in pthread_mutex_lock () from
>> /lib/libpthread.so.0
>> #3 0x00007f7e601b0806 in g_main_context_find_source_by_id () from
>> /lib/libglib-2.0.so.0
>> #4 0x00007f7e601b08fe in g_source_remove () from /lib/libglib-2.0.so.0
>> #5 0x00007f7e61568ba1 in G_main_del_IPC_Channel (chp=0x11deed0) at
>> GSource.c:495
>> #6 0x00000000004065a1 in on_remove_client (user_data=0x11df8e0) at
>> lrmd.c:1526
>> #7 0x00007f7e615694ca in G_CH_destroy_int (source=0x11deed0) at
>> GSource.c:675
>> #8 0x00007f7e601adc11 in ?? () from /lib/libglib-2.0.so.0
>> #9 0x00007f7e601ae428 in g_main_context_dispatch () from
>> /lib/libglib-2.0.so.0
>> #10 0x00007f7e601b22a8 in ?? () from /lib/libglib-2.0.so.0
>> #11 0x00007f7e601b27b5 in g_main_loop_run () from /lib/libglib-2.0.so.0
>> #12 0x0000000000405d32 in init_start () at lrmd.c:1267
>> #13 0x0000000000404f7a in main (argc=1, argv=0x7fff91e24478) at
>> lrmd.c:835
>>
>
> OK - what I understand having spent an evening looking at the source
> code is that upon lrmadmin client disconnecting from lrmd's cmd socket
> (having got what it needs) lrmd is left to tidy up by deleting the client
> event source from the GMainContext GLib loop. It is in the process of
> calling g_source_remove() which then hangs deep inside GLib on a mutex
> lock.
>
> On the surface the overall sequence makes sense but the hang doesnt and
> clearly shouldnt happen. I am at a loss as to whether it is a GLib
> issues (unlikely I would have thought?) or its an lrmd bug.
>
> lrmd should NEVER hang! Can anyone help?
>
> Are there any other mailing lists I can try??
They are discussing something similar over on linux-ha.
Are you using upstart resources in the cluster by any chance?
More information about the Pacemaker
mailing list