[Pacemaker] pacemaker processes RSS growth

Andrew Beekhof andrew at beekhof.net
Wed Sep 12 23:16:44 EDT 2012


On Thu, Sep 13, 2012 at 12:30 AM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 12.09.2012 10:35, Andrew Beekhof wrote:
>> On Tue, Sep 11, 2012 at 6:14 PM, Vladislav Bogdanov
>> <bubble at hoster-ok.com> wrote:
>>> 07.09.2012 09:25, Vladislav Bogdanov wrote:
>>>> 06.09.2012 12:58, Vladislav Bogdanov wrote:
>>>> ...
>>>>> lrmd seems not to clean up gio channels properly:
>>>>
>>>> I prefer to call g_io_channel_unref() right after g_io_add_watch_full()
>>>> instead of doing so when deleting descriptor (g_source_remove() is
>>>> enough there). So channel will be automatically freed after watch fd is
>>>> removed from poll with g_source_remove(). If I need to replace watch
>>>> flags, I call g_io_channel_ref()/g_io_channel_unref() around
>>>> g_source_remove()/g_io_add_watch_full() pair. This would require some
>>>> care and testing though. I have spent two days on this when writing my
>>>> first non-blocking server based on glib and gio. This code now works
>>>> like a charm with approach outlined above.
>>>
>>> Does it make some sense?
>>
>> Yes, I just needed to read it a few times before it sunk in :)
>> The interactions are pretty complex, before I commit anything I'd like
>> to be able to verify the refcounts are correct... how do you find out
>> the refcount for these glib objects?
>
> Frankly speaking just by try-and-fail method, while reading through its
> code. glib structures contain ref_count field, it is possible to reach
> it with some hacks.
> The main point with gio_channel is that both g_io_channel_unix_new() and
> g_io_add_watch_full() increment refcount, iirc, it is documented in glib
> sources.
>
> You can send me patches if you prefer some independent testing before
> commit. Hopefully I'll be able to test them quickly.

These are the two patches I'm going with (the second builds on the first):

+ Andrew Beekhof (16 hours ago) 6554777: High: Core: Ensure file
descriptors and IPC channels attached to mainloop are correctly
cleaned up on disconnection
+ Andrew Beekhof (15 hours ago) 2a160e9: High: Core: Ensure
server-side IPC connections are cleaned up on client exit  (HEAD,
origin/master, origin/HEAD, master)

Both are in my repo (not ClusterLabs yet).
They include terrible terrible hacks for displaying the ref_count
(only if tracing is enabled) and after reading the glib sources and
running some tests through valgrind (testing the graceful and
non-graceful termination of both sides), I'm quite confident the
combination is correct.

>
>>
>>>
>>> g_io_add_watch_full() refs channel, so after channel is created and its
>>> fd is put to mainloop with g_io_add_watch_full, channel has 2 in
>>> refcount. Thus, it is not freed after g_source_remove() is called once.
>>>
>>>>
>>>>>
>>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>>>>> definitely lost in loss record 147 of 152
>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>>>>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>>>>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>> ==1734==
>>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>>>>> definitely lost in loss record 148 of 152
>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>>>>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>>>>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>> ==1734==
>>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>>>>> definitely lost in loss record 151 of 152
>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>>>>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>>>>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>> ==1734==
>>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>>>>> definitely lost in loss record 152 of 152
>>>>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>>>>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>>>>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>>>>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>>>>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x7194D54: g_main_loop_run (in
>>>>> /lib64/libglib-2.0.so.0.2200.5)
>>>>> ==1734==    by 0x402427: main (main.c:302)
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list