[Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu Oct 9 21:55:00 EDT 2014
Hi Andrew,
Okay!
I test your patch.
And I inform you of a result.
Many thanks!
Hideo Yamauchi.
----- Original Message -----
> From: Andrew Beekhof <andrew at beekhof.net>
> To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Cc:
> Date: 2014/10/10, Fri 10:47
> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
>
> Perfect!
>
> Can you try this:
>
> diff --git a/lib/services/services.c b/lib/services/services.c
> index 8590b56..cb0f0ae 100644
> --- a/lib/services/services.c
> +++ b/lib/services/services.c
> @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char *action,
> int interval /* ms */
> free(id);
>
> if (op == NULL) {
> + op->opaque->repeat_timer = 0;
> return FALSE;
> }
>
> @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char *action,
> int interval /* ms */
> } else {
> if (op->opaque->repeat_timer) {
> g_source_remove(op->opaque->repeat_timer);
> + op->opaque->repeat_timer = 0;
> }
> recurring_action_timer(op);
> return TRUE;
> @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void
> (*action_callback) (svc_actio
> if (dup->pid != 0) {
> if (op->opaque->repeat_timer) {
> g_source_remove(op->opaque->repeat_timer);
> + op->opaque->repeat_timer = 0;
> }
> recurring_action_timer(dup);
> }
>
>
> On 10 Oct 2014, at 12:16 pm, renayama19661014 at ybb.ne.jp wrote:
>
>> Hi Andrew,
>>
>> Setting of gdb of the Ubuntu environment does not yet go well and I touch
> lrmd and cannot acquire trace.
>> Please wait for this a little more.
>>
>>
>> But.. I let lrmd terminate abnormally when g_source_remove() of
> cancel_recurring_action() returned FALSE.
>> -----
>> gboolean
>> cancel_recurring_action(svc_action_t * op)
>> {
>> crm_info("Cancelling operation %s", op->id);
>>
>> if (recurring_actions) {
>> g_hash_table_remove(recurring_actions, op->id);
>> }
>>
>> if (op->opaque->repeat_timer) {
>> if (g_source_remove(op->opaque->repeat_timer) == FALSE) {
>> abort();
>> }
>> (snip)
>> -------core----
>> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>>
>> 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) where
>> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1 0x00007f30aa613388 in __GI_abort () at abort.c:89
>> #2 0x00007f30aadcde77 in crm_abort (file=file at entry=0x7f30aae0152b
> "logging.c",
>> function=function at entry=0x7f30aae028c0 <__FUNCTION__.23262>
> "crm_glib_handler", line=line at entry=73,
>> assert_condition=assert_condition at entry=0x19d2ad0 "Source ID 63
> was not found when attempting to remove it", do_core=do_core at entry=1,
>> do_fork=<optimized out>, do_fork at entry=1) at utils.c:1195
>> #3 0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e
> "GLib", flags=<optimized out>,
>> message=0x19d2ad0 "Source ID 63 was not found when attempting to
> remove it", user_data=<optimized out>) at logging.c:73
>> #4 0x00007f30aa320ae1 in g_logv () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #5 0x00007f30aa320d72 in g_log () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #6 0x00007f30aa318c5c in g_source_remove () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #7 0x00007f30aabb2b55 in cancel_recurring_action (op=op at entry=0x19caa90)
> at services.c:363
>> #8 0x00007f30aabb2bee in services_action_cancel (name=name at entry=0x19d0530
> "dummy3", action=<optimized out>, interval=interval at entry=10000)
>> at services.c:385
>> #9 0x000000000040405a in cancel_op (rsc_id=rsc_id at entry=0x19d0530
> "dummy3", action=action at entry=0x19cec10 "monitor",
> interval=10000)
>> at lrmd.c:1404
>> #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74,
> request=0x19ca8a0) at lrmd.c:1468
>> #11 process_lrmd_message (client=client at entry=0x19c8290, id=74,
> request=request at entry=0x19ca8a0) at lrmd.c:1507
>> #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0,
> data=<optimized out>, size=361) at main.c:148
>> #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from
> /usr/lib/libqb.so.0
>> #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>,
> condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437
>> #15 0x00007f30aa319ce5 in g_main_context_dispatch () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> ---Type <return> to continue, or q <return> to quit---
>> #16 0x00007f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #17 0x00007f30aa31a30a in g_main_loop_run () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
>> #18 0x0000000000402774 in main (argc=<optimized out>,
> argv=0x7fffcdd90b88) at main.c:344
>> ---------
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> ----- Original Message -----
>>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>>> To: Andrew Beekhof <andrew at beekhof.net>
>>> Cc: The Pacemaker cluster resource manager
> <pacemaker at oss.clusterlabs.org>
>>> Date: 2014/10/7, Tue 11:15
>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of
> glib, g_source_remove fails.
>>>
>>> Hi Andrew,
>>>
>>>> Not quite. Returning FALSE from the callback also removes the
> source from
>>> glib.
>>>> So your test case effectively removes t1 twice: once implicitly by
>>> returning
>>>> FALSE in timer_func1() and then again explicitly in timer_func3()
>>>
>>>
>>> Your opinion is right.
>>>
>>>
>>> If Pacemaker repeats and does not remove the resources which timer
> concluded in
>>> FALSE, glib does not return the error.
>>>
>>>
>>> Many Thanks,
>>> Hideo Yamauchi.
>>>
>>>
>>> ----- Original Message -----
>>>> From: Andrew Beekhof <andrew at beekhof.net>
>>>> To: renayama19661014 at ybb.ne.jp
>>>> Cc: The Pacemaker cluster resource manager
>>> <pacemaker at oss.clusterlabs.org>
>>>> Date: 2014/10/7, Tue 11:06
>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version
> of
>>> glib, g_source_remove fails.
>>>>
>>>>
>>>> On 7 Oct 2014, at 1:03 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>>>> These problems seem to be due to a correction of next
> glib
>>> somehow
>>>> or
>>>>>> other.
>>>>>>> *
>>>>>>
>>>>
>>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>
>>>>>> The glib behaviour on unbuntu seems reasonable, removing
> a source
>>>> multiple times
>>>>>> IS a valid error.
>>>>>> I need the stack trace to know where/how this situation
> can occur
>>> in
>>>> pacemaker.
>>>>>
>>>>>
>>>>> Pacemaker does not remove resources several times as far as I
>
>>> confirmed it.
>>>>> In Ubuntu(glib2.40), an error occurs just to remove resources
> first.
>>>>
>>>> Not quite. Returning FALSE from the callback also removes the
> source from
>>> glib.
>>>> So your test case effectively removes t1 twice: once implicitly by
>>> returning
>>>> FALSE in timer_func1() and then again explicitly in timer_func3()
>>>>
>>>>>
>>>>> Confirmation and the deletion of resources seem to be
> necessary not to
>>>
>>>> produce an error in Ubuntu.
>>>>> And this works well in glib of RHEL6.x.(and RHEL7.0)
>>>>>
>>>>> if (g_main_context_find_source_by_id (NULL, t1) !=
> NULL) {
>>>>> g_source_remove(t1);
>>>>> }
>>>>>
>>>>> I send it to you after acquiring stack trace.
>>>>>
>>>>> Many Thanks!
>>>>> Hideo Yamauchi.
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: Andrew Beekhof <andrew at beekhof.net>
>>>>>> To: renayama19661014 at ybb.ne.jp; The Pacemaker cluster
> resource
>>> manager
>>>> <pacemaker at oss.clusterlabs.org>
>>>>>> Cc:
>>>>>> Date: 2014/10/7, Tue 09:44
>>>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a
> new
>>> version of
>>>> glib, g_source_remove fails.
>>>>>>
>>>>>>
>>>>>> On 6 Oct 2014, at 4:09 pm, renayama19661014 at ybb.ne.jp
> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> When I move the next sample in
> RHEL6.5(glib2-2.22.5-7.el6) and
>>>
>>>>>> Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement is
> different.
>>>>>>>
>>>>>>> * Sample : test2.c
>>>>>>> {{{
>>>>>>> #include <stdio.h>
>>>>>>> #include <stdlib.h>
>>>>>>> #include <glib.h>
>>>>>>> #include <sys/times.h>
>>>>>>> guint t1, t2, t3;
>>>>>>> gboolean timer_func2(gpointer data){
>>>>>>> printf("TIMER EXPIRE!2\n");
>>>>>>> fflush(stdout);
>>>>>>> return FALSE;
>>>>>>> }
>>>>>>> gboolean timer_func1(gpointer data){
>>>>>>> clock_t ret;
>>>>>>> struct tms buff;
>>>>>>>
>>>>>>> ret = times(&buff);
>>>>>>> printf("TIMER EXPIRE!1 %d\n",
>>> (int)ret);
>>>>>>> fflush(stdout);
>>>>>>> return FALSE;
>>>>>>> }
>>>>>>> gboolean timer_func3(gpointer data){
>>>>>>> printf("TIMER EXPIRE 3!\n");
>>>>>>> fflush(stdout);
>>>>>>> printf("remove timer1!\n");
>>>>>>>
>>>>>>> fflush(stdout);
>>>>>>> g_source_remove(t1);
>>>>>>> printf("remove timer2!\n");
>>>>>>> fflush(stdout);
>>>>>>> g_source_remove(t2);
>>>>>>> printf("remove timer3!\n");
>>>>>>> fflush(stdout);
>>>>>>> g_source_remove(t3);
>>>>>>> return FALSE;
>>>>>>> }
>>>>>>> int main(int argc, char** argv){
>>>>>>> GMainLoop *m;
>>>>>>> clock_t ret;
>>>>>>> struct tms buff;
>>>>>>> gint64 t;
>>>>>>> m = g_main_new(FALSE);
>>>>>>> t1 = g_timeout_add(1000, timer_func1, NULL);
>>>>>>> t2 = g_timeout_add(60000, timer_func2, NULL);
>>>>>>> t3 = g_timeout_add(5000, timer_func3, NULL);
>>>>>>> ret = times(&buff);
>>>>>>> printf("START! %d\n",
> (int)ret);
>>>>>>> g_main_run(m);
>>>>>>> }
>>>>>>>
>>>>>>> }}}
>>>>>>> * Result
>>>>>>> ---- RHEL6.5(glib2-2.22.5-7.el6) ----
>>>>>>> [root at snmp1 ~]# ./test2
>>>>>>> START! 429576012
>>>>>>> TIMER EXPIRE!1 429576112
>>>>>>> TIMER EXPIRE 3!
>>>>>>> remove timer1!
>>>>>>> remove timer2!
>>>>>>> remove timer3!
>>>>>>>
>>>>>>> ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) ----
>>>>>>> root at a1be102:~# ./test2
>>>>>>> START! 1718163089
>>>>>>> TIMER EXPIRE!1 1718163189
>>>>>>> TIMER EXPIRE 3!
>>>>>>> remove timer1!
>>>>>>>
>>>>>>> (process:1410): GLib-CRITICAL **: Source ID 1 was not
> found
>>> when
>>>> attempting
>>>>>> to remove it
>>>>>>> remove timer2!
>>>>>>> remove timer3!
>>>>>>>
>>>>>>>
>>>>>>> These problems seem to be due to a correction of next
> glib
>>> somehow
>>>> or
>>>>>> other.
>>>>>>> *
>>>>>>
>>>>
>>>
> https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c
>>>>>>
>>>>>> The glib behaviour on unbuntu seems reasonable, removing
> a source
>>>> multiple times
>>>>>> IS a valid error.
>>>>>> I need the stack trace to know where/how this situation
> can occur
>>> in
>>>> pacemaker.
>>>>>>
>>>>>>>
>>>>>>> In g_source_remove() until before change, the
> deletion of the
>>> timer
>>>> which
>>>>>> practice completed is possible, but g_source_remove()
> after the
>>> change
>>>> causes an
>>>>>> error.
>>>>>>>
>>>>>>> Under this influence, we get the following crit error
> in the
>>>> environment of
>>>>>> Pacemaker using a new version of glib.
>>>>>>>
>>>>>>> lrmd[1632]: error: crm_abort: crm_glib_handler:
> Forked
>>> child
>>>> 1840 to
>>>>>>> record non-fatal assert at logging.c:73 : Source ID
> 51 was not
>>>
>>>> found when
>>>>>>> attempting to remove it
>>>>>>> lrmd[1632]: crit: crm_glib_handler: GLib: Source
> ID 51 was
>>> not
>>>> found
>>>>>>> when attempting to remove it
>>>>>>>
>>>>>>> It seems that some kind of coping is necessary in
> Pacemaker
>>> when I
>>>> think
>>>>>> about next.
>>>>>>> * Distribution using a new version of glib including
> Ubuntu.
>>>>>>> * Version up of future glib of RHEL.
>>>>>>>
>>>>>>> A similar problem is reported in the ML.
>>>>>>> *
>>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333
>>>>>>> *
>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list