[Pacemaker] pacemaker processes RSS growth

Thu Sep 13 08:18:48 EDT 2012

13.09.2012 06:16, Andrew Beekhof wrote:
> On Thu, Sep 13, 2012 at 12:30 AM, Vladislav Bogdanov
> <bubble at hoster-ok.com> wrote:
>> 12.09.2012 10:35, Andrew Beekhof wrote:
>>> On Tue, Sep 11, 2012 at 6:14 PM, Vladislav Bogdanov
>>> <bubble at hoster-ok.com> wrote:
>>>> 07.09.2012 09:25, Vladislav Bogdanov wrote:
>>>>> 06.09.2012 12:58, Vladislav Bogdanov wrote:
>>>>> ...
>>>>>> lrmd seems not to clean up gio channels properly:
>>>>>
>>>>> I prefer to call g_io_channel_unref() right after g_io_add_watch_full()
>>>>> instead of doing so when deleting descriptor (g_source_remove() is
>>>>> enough there). So channel will be automatically freed after watch fd is
>>>>> removed from poll with g_source_remove(). If I need to replace watch
>>>>> flags, I call g_io_channel_ref()/g_io_channel_unref() around
>>>>> g_source_remove()/g_io_add_watch_full() pair. This would require some
>>>>> care and testing though. I have spent two days on this when writing my
>>>>> first non-blocking server based on glib and gio. This code now works
>>>>> like a charm with approach outlined above.
>>>>
>>>> Does it make some sense?
>>>
>>> Yes, I just needed to read it a few times before it sunk in :)
>>> The interactions are pretty complex, before I commit anything I'd like
>>> to be able to verify the refcounts are correct... how do you find out
>>> the refcount for these glib objects?
>>
>> Frankly speaking just by try-and-fail method, while reading through its
>> code. glib structures contain ref_count field, it is possible to reach
>> it with some hacks.
>> The main point with gio_channel is that both g_io_channel_unix_new() and
>> g_io_add_watch_full() increment refcount, iirc, it is documented in glib
>> sources.
>>
>> You can send me patches if you prefer some independent testing before
>> commit. Hopefully I'll be able to test them quickly.
> 
> These are the two patches I'm going with (the second builds on the first):
> 
> + Andrew Beekhof (16 hours ago) 6554777: High: Core: Ensure file
> descriptors and IPC channels attached to mainloop are correctly
> cleaned up on disconnection
> + Andrew Beekhof (15 hours ago) 2a160e9: High: Core: Ensure
> server-side IPC connections are cleaned up on client exit  (HEAD,
> origin/master, origin/HEAD, master)
> 
> Both are in my repo (not ClusterLabs yet).
> They include terrible terrible hacks for displaying the ref_count
> (only if tracing is enabled) and after reading the glib sources and
> running some tests through valgrind (testing the graceful and
> non-graceful termination of both sides), I'm quite confident the
> combination is correct.
> 

I built it with following patch:

--- a/lib/common/mainloop.c     2012-09-13 10:28:20.000000000 +0000
+++ b/lib/common/mainloop.c     2012-09-13 10:30:03.065347404 +0000
@@ -724,6 +724,8 @@ mainloop_add_fd(
          *      -> g_source_callback_unref()
          * shortly after mainloop_gio_destroy() completes
          */
+        g_io_channel_unref(client->channel);
+
         crm_trace("Added connection %d for %s[%p].%d %d",
client->source, client->name, client, fd, mainloop_gio_refcount(client));
     }

and now it runs on my testing cluster.

Ipc-related memory problems seem to be completely fixed now, processes
own memory (RES-SHR in terms of htop) does not grow any longer (after 40
minutes). Although I see that both RES and SHR counters sometimes
increase synchronously. lrmd does not grow at all. Will look again after
few hours.

I noticed two minor problems, first is:
Sep 13 10:49:50 vd01-a attrd[1732]:   notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Sep 13 10:49:51 vd01-a attrd[1732]:  warning: attrd_cib_callback: Update
shutdown=(null) failed: No such device or address
Sep 13 10:49:51 vd01-a attrd[1732]:  warning: attrd_cib_callback: Update
terminate=(null) failed: No such device or address
Sep 13 10:49:59 vd01-a attrd[1732]:  warning: attrd_cib_callback: Update
probe_complete=(null) failed: No such device or address
Sep 13 10:50:01 vd01-a attrd[1732]:  warning: attrd_cib_callback: Update
probe_complete=(null) failed: No such device or address
Sep 13 10:50:01 vd01-a attrd[1732]:  warning: attrd_cib_callback: Update
probe_complete=(null) failed: No such device or address

Second is that after node joined cluster, I saw following in crm status:

Failed actions:
    libvirt-images-pool_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    libvirt-install-fs_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    libvirtd_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed Out):
unknown error
    multipathd_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed Out):
unknown error
    node-params_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed
Out): unknown error
    vds-ok-pool-dev-iscsi_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    vds-ok-pool-dev-pool_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    vds-ok-pool-dev-vg_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    vlan1-if_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed Out):
unknown error
    vlan101-if_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed Out):
unknown error
    vlan900-if_monitor_0 (node=vd01-a, call=-1, rc=1, status=Timed Out):
unknown error
    stonith-vd01-d-cdev_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    stonith-vd01-c-cdev_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    stonith-vd01-b-cdev_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error
    stonith-vd01-a-cdev_monitor_0 (node=vd01-a, call=-1, rc=1,
status=Timed Out): unknown error

So many resources are marked as failed with call id -1. That disappeared
in one second though. Should I care about that?

One more issue is annoying message on DC
crit: get_timet_now: Defaulting to 'now'

And last one (very minor?) is that now I sometimes see zombie crmd
children (f.e. Z 0:00: [LVM2] <defunct>). Can't recall if I saw them
before. Something changed in child ripping?

Otherwise everything looks good now.

Ah, forgot to mention, that it is impossible to build pacemaker with
qb-based ipc if cluster-glue headers are installed. Shouldn't that be
made conditional? Like this:

--- a/configure.ac      2012-08-21 10:59:49.000000000 +0000
+++ b/configure.ac      2012-08-22 07:24:29.691100132 +0000
@@ -996,26 +996,33 @@

 LIBS="$LIBS $libqb_LIBS"

-AC_CHECK_HEADERS(heartbeat/hb_config.h)
-AC_CHECK_HEADERS(heartbeat/glue_config.h)
-AC_CHECK_HEADERS(stonith/stonith.h)
-AC_CHECK_HEADERS(agent_config.h)
 GLUE_HEADER=none
 HAVE_GLUE=0
-if test "$ac_cv_header_heartbeat_glue_config_h" = "yes";  then
-   GLUE_HEADER=glue_config.h
-   HAVE_GLUE=1
-
-elif test "$ac_cv_header_heartbeat_hb_config_h" = "yes"; then
-   GLUE_HEADER=hb_config.h
-   HAVE_GLUE=1
-else
-   AC_MSG_WARN(cluster-glue development headers were not found)
-fi

-if test "$ac_cv_header_stonith_stonith_h" = "yes";  then
-   PCMK_FEATURES="$PCMK_FEATURES lha-fencing"
-fi
+case $SUPPORT_HEARTBEAT in
+1|yes|true|try)
+    AC_CHECK_HEADERS(heartbeat/hb_config.h)
+    AC_CHECK_HEADERS(heartbeat/glue_config.h)
+    AC_CHECK_HEADERS(stonith/stonith.h)
+    AC_CHECK_HEADERS(agent_config.h)
+    if test "$ac_cv_header_heartbeat_glue_config_h" = "yes";  then
+       GLUE_HEADER=glue_config.h
+       HAVE_GLUE=1
+
+    elif test "$ac_cv_header_heartbeat_hb_config_h" = "yes"; then
+       GLUE_HEADER=hb_config.h
+       HAVE_GLUE=1
+    else
+       AC_MSG_WARN(cluster-glue development headers were not found)
+    fi
+
+    if test "$ac_cv_header_stonith_stonith_h" = "yes";  then
+       PCMK_FEATURES="$PCMK_FEATURES lha-fencing"
+    fi
+    ;;
+*)
+    ;;
+esac

 if test $HAVE_GLUE = 1; then
    dnl On Debian, AC_CHECK_LIBS fail if a library has any unresolved
symbols


>>
>>>
>>>>
>>>> g_io_add_watch_full() refs channel, so after channel is created and its
>>>> fd is put to mainloop with g_io_add_watch_full, channel has 2 in
>>>> refcount. Thus, it is not freed after g_source_remove() is called once.
>>>>
>>>>>
>>>>>>
>>>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>>>>>> definitely lost in loss record 147 of 152
>>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>>>>>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>>>>>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>>> ==1734==
>>>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>>>>>> definitely lost in loss record 148 of 152
>>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>>>>>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>>>>>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>>> ==1734==
>>>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>>>>>> definitely lost in loss record 151 of 152
>>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>>>>>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>>>>>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>>> ==1734==
>>>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>>>>>> definitely lost in loss record 152 of 152
>>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>>>>>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>>>>>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>