[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu Oct 9 21:16:15 EDT 2014
Hi Andrew,
Setting of gdb of the Ubuntu environment does not yet go well and I touch lrmd and cannot acquire trace.
Please wait for this a little more.
But.. I let lrmd terminate abnormally when g_source_remove() of cancel_recurring_action() returned FALSE.
-----
gboolean
cancel_recurring_action(svc_action_t * op)
{
crm_info("Cancelling operation %s", op->id);
if (recurring_actions) {
g_hash_table_remove(recurring_actions, op->id);
}
if (op->opaque->repeat_timer) {
if (g_source_remove(op->opaque->repeat_timer) == FALSE) {
abort();
}
(snip)
-------core----
#0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f30aa613388 in __GI_abort () at abort.c:89
#2 0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b "logging.c",
function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262> "crm_glib_handler", line=line at entry=73,
assert_condition=assert_condition at entry=0x19d2ad0 "Source ID 63 was not found when attempting to remove it", do_core=do_core at entry=1,
do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
#3 0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e "GLib", flags=<optimized out>,
message=0x19d2ad0 "Source ID 63 was not found when attempting to remove it", user_data=<optimized out>) at logging.c:73
#4 0x00007f30aa320ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007f30aa320d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007f30aa318c5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007f30aabb2b55 in cancel_recurring_action (op=op at entry=0x19caa90) at services.c:363
#8 0x00007f30aabb2bee in services_action_cancel (name=name at entry=0x19d0530 "dummy3", action=<optimized out>, interval=interval at entry=10000)
at services.c:385
#9 0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530 "dummy3", action=action at entry=0x19cec10 "monitor", interval=10000)
at lrmd.c:1404
#10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74, request=0x19ca8a0) at lrmd.c:1468
#11 process_lrmd_message (client=client at entry=0x19c8290, id=74, request=request at entry=0x19ca8a0) at lrmd.c:1507
#12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0, data=<optimized out>, size=361) at main.c:148
#13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0
#14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>, condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
#15 0x00007f30aa319ce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
#16 0x00007f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007f30aa31a30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x0000000000402774 in main (argc=<optimized out>, argv=0x7fffcdd90b88) at main.c:344
---------
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Andrew Beekhof <andrew at beekhof.net>
> Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Date: 2014/10/7, Tue 11:15
> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
>
> Hi Andrew,
>
>> Not quite. Returning FALSE from the callback also removes the source from
> glib.
>> So your test case effectively removes t1 twice: once implicitly by
> returning
>> FALSE in timer_func1() and then again explicitly in timer_func3()
>
>
> Your opinion is right.
>
>
> If Pacemaker repeats and does not remove the resources which timer concluded in
> FALSE, glib does not return the error.
>
>
> Many Thanks,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Andrew Beekhof <andrew at beekhof.net>
>> To: renayama19661014 at ybb.ne.jp
>> Cc: The Pacemaker cluster resource manager
> <pacemaker at oss.clusterlabs.org>
>> Date: 2014/10/7, Tue 11:06
>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of
> glib, g_source_remove fails.
>>
>>
>> On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>
>>> Hi Andrew,
>>>
>>>>> These problems seem to be due to a correction of next glib
> somehow
>> or
>>>> other.
>>>>> *
>>>>
>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>
>>>> The glib behaviour on unbuntu seems reasonable, removing a source
>> multiple times
>>>> IS a valid error.
>>>> I need the stack trace to know where/how this situation can occur
> in
>> pacemaker.
>>>
>>>
>>> Pacemaker does not remove resources several times as far as I
> confirmed it.
>>> In Ubuntu(glib2.40), an error occurs just to remove resources first.
>>
>> Not quite. Returning FALSE from the callback also removes the source from
> glib.
>> So your test case effectively removes t1 twice: once implicitly by
> returning
>> FALSE in timer_func1() and then again explicitly in timer_func3()
>>
>>>
>>> Confirmation and the deletion of resources seem to be necessary not to
>
>> produce an error in Ubuntu.
>>> And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>
>>> if (g_main_context_find_source_by_id (NULL, t1) != NULL) {
>>> g_source_remove(t1);
>>> }
>>>
>>> I send it to you after acquiring stack trace.
>>>
>>> Many Thanks!
>>> Hideo Yamauchi.
>>>
>>> ----- Original Message -----
>>>> From: Andrew Beekhof <andrew at beekhof.net>
>>>> To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource
> manager
>> <pacemaker at oss.clusterlabs.org>
>>>> Cc:
>>>> Date: 2014/10/7, Tue 09:44
>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new
> version of
>> glib, g_source_remove fails.
>>>>
>>>>
>>>> On 6 Oct 2014, at 4:09 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> When I move the next sample in RHEL6.5(glib2-2.22.5-7.el6) and
>
>>>> Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement is different.
>>>>>
>>>>> * Sample : test2.c
>>>>> {{{
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <glib.h>
>>>>> #include <sys/times.h>
>>>>> guint t1, t2, t3;
>>>>> gboolean timer_func2(gpointer data){
>>>>> printf("TIMER EXPIRE!2\n");
>>>>> fflush(stdout);
>>>>> return FALSE;
>>>>> }
>>>>> gboolean timer_func1(gpointer data){
>>>>> clock_t ret;
>>>>> struct tms buff;
>>>>>
>>>>> ret = times(&buff);
>>>>> printf("TIMER EXPIRE!1 %d\n",
> (int)ret);
>>>>> fflush(stdout);
>>>>> return FALSE;
>>>>> }
>>>>> gboolean timer_func3(gpointer data){
>>>>> printf("TIMER EXPIRE 3!\n");
>>>>> fflush(stdout);
>>>>> printf("remove timer1!\n");
>>>>>
>>>>> fflush(stdout);
>>>>> g_source_remove(t1);
>>>>> printf("remove timer2!\n");
>>>>> fflush(stdout);
>>>>> g_source_remove(t2);
>>>>> printf("remove timer3!\n");
>>>>> fflush(stdout);
>>>>> g_source_remove(t3);
>>>>> return FALSE;
>>>>> }
>>>>> int main(int argc, char** argv){
>>>>> GMainLoop *m;
>>>>> clock_t ret;
>>>>> struct tms buff;
>>>>> gint64 t;
>>>>> m = g_main_new(FALSE);
>>>>> t1 = g_timeout_add(1000, timer_func1, NULL);
>>>>> t2 = g_timeout_add(60000, timer_func2, NULL);
>>>>> t3 = g_timeout_add(5000, timer_func3, NULL);
>>>>> ret = times(&buff);
>>>>> printf("START! %d\n", (int)ret);
>>>>> g_main_run(m);
>>>>> }
>>>>>
>>>>> }}}
>>>>> * Result
>>>>> ---- RHEL6.5(glib2-2.22.5-7.el6) ----
>>>>> [root at snmp1 ~]# ./test2
>>>>> START! 429576012
>>>>> TIMER EXPIRE!1 429576112
>>>>> TIMER EXPIRE 3!
>>>>> remove timer1!
>>>>> remove timer2!
>>>>> remove timer3!
>>>>>
>>>>> ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) ----
>>>>> root at a1be102:~# ./test2
>>>>> START! 1718163089
>>>>> TIMER EXPIRE!1 1718163189
>>>>> TIMER EXPIRE 3!
>>>>> remove timer1!
>>>>>
>>>>> (process:1410): GLib-CRITICAL **: Source ID 1 was not found
> when
>> attempting
>>>> to remove it
>>>>> remove timer2!
>>>>> remove timer3!
>>>>>
>>>>>
>>>>> These problems seem to be due to a correction of next glib
> somehow
>> or
>>>> other.
>>>>> *
>>>>
>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>
>>>> The glib behaviour on unbuntu seems reasonable, removing a source
>> multiple times
>>>> IS a valid error.
>>>> I need the stack trace to know where/how this situation can occur
> in
>> pacemaker.
>>>>
>>>>>
>>>>> In g_source_remove() until before change, the deletion of the
> timer
>> which
>>>> practice completed is possible, but g_source_remove() after the
> change
>> causes an
>>>> error.
>>>>>
>>>>> Under this influence, we get the following crit error in the
>> environment of
>>>> Pacemaker using a new version of glib.
>>>>>
>>>>> lrmd[1632]: error: crm_abort: crm_glib_handler: Forked
> child
>> 1840 to
>>>>> record non-fatal assert at logging.c:73 : Source ID 51 was not
>
>> found when
>>>>> attempting to remove it
>>>>> lrmd[1632]: crit: crm_glib_handler: GLib: Source ID 51 was
> not
>> found
>>>>> when attempting to remove it
>>>>>
>>>>> It seems that some kind of coping is necessary in Pacemaker
> when I
>> think
>>>> about next.
>>>>> * Distribution using a new version of glib including Ubuntu.
>>>>> * Version up of future glib of RHEL.
>>>>>
>>>>> A similar problem is reported in the ML.
>>>>> *
>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>> *
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list