[Pacemaker] pacemaker processes RSS growth

Thu Sep 6 20:03:04 EDT 2012

On 06/09/2012, at 10:43 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:

> 06.09.2012 12:58, Vladislav Bogdanov wrote:
>> 06.09.2012 12:39, Andrew Beekhof wrote:
>>> On Thu, Sep 6, 2012 at 5:33 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>> 06.09.2012 10:19, Andrew Beekhof wrote:
>>>>> On Thu, Sep 6, 2012 at 5:14 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I noticed that some pacemaker processes grow during operation (commit
>>>>>> 8535316). Running on top of corosync 2.0.1.
>>>>>> I notched RSS size (RES as htop reports) with interval of ~18 hours.
>>>>>> First column is notched after ~1 hour of operation.
>>>>>> 
>>>>>> Results are:
>>>>>> pengine      23568       23780
>>>>>> crmd         15592       17420
>>>>>> cib          12356       12380
>>>>>> lrmd          4396       14128
>>>>>> stonithd      3812        3812
>>>>>> attrd         3240        3244
>>>>>> pacemakerd    3104        3104
>>> 
>>> What unit are these values?
>> 
>> kb
>> 
>>> 
>>>>>> 
>>>>>> Cluster is totally static, except cluster-recheck-interval is set to
>>>>>> 3min. No actions had been taken after the first notch.
>>>>>> 
>>>>>> This make me think of some slow memory leaks in crmd and lrmd, but I
>>>>>> can't say that for sure because of glib.
>>>>>> 
>>>>>> I do not know if CTS or coverity cover this, so I can try to run under
>>>>>> valgrind if somebody give me instructions how to do that.
>>>>> 
>>>>> Check the bottom of /etc/sysconfig/pacemaker :-)
>>>> 
>>>> Heh :)
>>>> 
>>>>> Valgrind takes a heavy toll though... perhaps set
>>>> 
>>>> I know. That is a test cluster, so it doesn't matter.
>>>> 
>>>>> 
>>>>> export G_SLICE=always-malloc
>>>> 
>>>> Yes, I know. That does not help to deal with one-time glib
>>>> initializations though, like gtype one.
>>> 
>>> Right, but they should be accounted for already in the first column.
>> 
>> They usually take too much space in valgrind reports. Pacemaker doesn't
>> seem to be affected much though.
>> 
>>> 
>>>> May be you have appropriate
>>>> .supp for all daemons? I see only cts.supp, ptest.supp and cli.supp.
>>> 
>>> The PE shouldn't be leaking at all.
>>> Every 5th commit to master runs the PE and cli (which also exercises
>>> most of the cib) regression tests with valgrind and it hasn't reported
>>> anything.
>>> Stonithd appears to be in the clear too.
>> 
>> Yep, that two are OK.
>> 
>> cib shows tons of reachable memory in finished child processes. Not
>> important but log is huge, so it is very hard to find real errors.
> 
> I found two minor (very slow) leaks in cib too:
> 
> ==1732== 80 bytes in 20 blocks are still reachable in loss record 42 of 66
> ==1732==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
> ==1732==    by 0x410549: cib_get_operation_id (common.c:225)

This one is (bounded) initialisation code.  We should add it to the suppressions file.

> ==1732==    by 0x40B370: cib_process_request (callbacks.c:679)
> ==1732==    by 0x40D066: cib_common_callback_worker (callbacks.c:265)
> ==1732==    by 0x40D25E: cib_common_callback (callbacks.c:325)
> ==1732==    by 0x7995813: ??? (in /usr/lib64/libqb.so.0.14.1)
> ==1732==    by 0x7995B53: qb_ipcs_dispatch_connection_request (in
> /usr/lib64/libqb.so.0.14.1)
> ==1732==    by 0x526FA44: gio_read_socket (mainloop.c:353)
> ==1732==    by 0x74D1F0D: g_main_context_dispatch (in
> /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x74D5937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x74D5D54: g_main_loop_run (in
> /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x40DB26: cib_init (main.c:551)
> ==1732==    by 0x40E18C: main (main.c:250)
> ==1732==
> 
> ==1732== 99 bytes in 3 blocks are still reachable in loss record 43 of 66
> ==1732==    at 0x4C25A28: calloc (vg_replace_malloc.c:467)
> ==1732==    by 0x525AF9E: crm_itoa (utils.c:325)
> ==1732==    by 0x4E3267C: get_node_uuid (cluster.c:166)
> ==1732==    by 0x4E3318F: crm_get_peer (membership.c:229)

This the uuid cache.
It will only grow if you add more nodes.

Probably the cib needs to call empty_uuid_cache() before exit.

> ==1732==    by 0x4E38524: pcmk_cpg_deliver (corosync.c:538)
> ==1732==    by 0x58A7EAE: cpg_dispatch (cpg.c:412)
> ==1732==    by 0x4E35981: pcmk_cpg_dispatch (corosync.c:577)
> ==1732==    by 0x5270966: mainloop_gio_callback (mainloop.c:533)
> ==1732==    by 0x74D1F0D: g_main_context_dispatch (in
> /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x74D5937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x74D5D54: g_main_loop_run (in
> /lib64/libglib-2.0.so.0.2200.5)
> ==1732==    by 0x40DB26: cib_init (main.c:551)
> ==1732==    by 0x40E18C: main (main.c:250)
> 
>> 
>> crmd has some errors, most annoying are following two (they repeat
>> constantly, and I see first one for other processes too):
>> 
>> ==1737== Syscall param socketcall.sendto(msg) points to uninitialised
>> byte(s)
>> ==1737==    at 0x7AE6002: send (in /lib64/libc-2.12.so)
>> ==1737==    by 0x869FE7F: qb_ipc_us_send (in /usr/lib64/libqb.so.0.13.0)
>> ==1737==    by 0x86A0FE5: qb_ipcc_us_setup_connect (in
>> /usr/lib64/libqb.so.0.13.0)

Not really a big issue (although I agree it looks like it should be).
If anything libqb needs to memset a stack variable to 0 before it sets a subset of the available fields.

Probably we should just suppress it.

>> ==1737==    by 0x869DD33: qb_ipcc_connect (in /usr/lib64/libqb.so.0.13.0)
>> ==1737==    by 0x5885459: crm_ipc_connect (ipc.c:341)
>> ==1737==    by 0x589DC27: mainloop_add_ipc_client (mainloop.c:613)
>> ==1737==    by 0x545C842: cib_native_signon_raw (cib_native.c:222)
>> ==1737==    by 0x41802D: do_cib_control (cib.c:167)
>> ==1737==    by 0x407726: s_crmd_fsa_actions (fsa.c:315)
>> ==1737==    by 0x408D85: s_crmd_fsa (fsa.c:250)
>> ==1737==    by 0x4109CD: crm_fsa_trigger (callbacks.c:251)
>> ==1737==    by 0x589C3C2: crm_trigger_dispatch (mainloop.c:105)
>> ==1737==    by 0x81DAF0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DE937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DED54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x404D9D: crmd_init (main.c:139)
>> ==1737==    by 0x7A1BCDC: (below main) (in /lib64/libc-2.12.so)
>> ==1737==  Address 0x7feffd514 is on thread 1's stack
>> ==1737==
>> ==1737== Invalid read of size 8
>> ==1737==    at 0x7B2FD44: __strspn_sse42 (in /lib64/libc-2.12.so)
>> ==1737==    by 0x588AA53: crm_get_msec (utils.c:640)

This i dont understand.

>> ==1737==    by 0x588AD28: check_time (utils.c:85)
>> ==1737==    by 0x588C8E8: cluster_option (utils.c:215)
>> ==1737==    by 0x588CB8D: verify_all_options (utils.c:287)
>> ==1737==    by 0x40B9AA: config_query_callback (control.c:779)
>> ==1737==    by 0x5457393: cib_native_callback (cib_utils.c:631)
>> ==1737==    by 0x545CD86: cib_native_dispatch_internal (cib_native.c:120)
>> ==1737==    by 0x589D71F: mainloop_gio_callback (mainloop.c:522)
>> ==1737==    by 0x81DAF0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DE937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DED54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x404D9D: crmd_init (main.c:139)
>> ==1737==    by 0x7A1BCDC: (below main) (in /lib64/libc-2.12.so)
>> ==1737==  Address 0xc158bd0 is 0 bytes inside a block of size 4 alloc'd
>> ==1737==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>> ==1737==    by 0x7A7D871: strdup (in /lib64/libc-2.12.so)
>> ==1737==    by 0x588CA70: cluster_option (utils.c:211)
>> ==1737==    by 0x588CB8D: verify_all_options (utils.c:287)
>> ==1737==    by 0x40B9AA: config_query_callback (control.c:779)
>> ==1737==    by 0x5457393: cib_native_callback (cib_utils.c:631)
>> ==1737==    by 0x545CD86: cib_native_dispatch_internal (cib_native.c:120)
>> ==1737==    by 0x589D71F: mainloop_gio_callback (mainloop.c:522)
>> ==1737==    by 0x81DAF0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DE937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DED54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x404D9D: crmd_init (main.c:139)
>> ==1737==    by 0x7A1BCDC: (below main) (in /lib64/libc-2.12.so)
>> 
>> And I suspect some leakage here:

Looks it.

>> 
>> ==1737== 51,314 (1,968 direct, 49,346 indirect) bytes in 82 blocks are
>> definitely lost in loss record 248 of 249
>> ==1737==    at 0x4C25A28: calloc (vg_replace_malloc.c:467)
>> ==1737==    by 0x5CB77E1: lrmd_key_value_add (lrmd_client.c:100)
>> ==1737==    by 0x41E199: do_lrm_rsc_op (lrm.c:1701)
>> ==1737==    by 0x420779: do_lrm_invoke (lrm.c:1450)
>> ==1737==    by 0x40C423: send_msg_via_ipc (messages.c:939)
>> ==1737==    by 0x40D2DF: relay_message (messages.c:454)
>> ==1737==    by 0x40F827: route_message (messages.c:322)
>> ==1737==    by 0x411A6E: crmd_ha_msg_filter (callbacks.c:96)
>> ==1737==    by 0x40547F: crmd_ais_dispatch (corosync.c:108)
>> ==1737==    by 0x5673569: pcmk_cpg_deliver (corosync.c:551)
>> ==1737==    by 0x65B0EAE: cpg_dispatch (cpg.c:412)
>> ==1737==    by 0x5670981: pcmk_cpg_dispatch (corosync.c:577)
>> ==1737==    by 0x589D966: mainloop_gio_callback (mainloop.c:533)
>> ==1737==    by 0x81DAF0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DE937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x81DED54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1737==    by 0x404D9D: crmd_init (main.c:139)
>> ==1737==    by 0x7A1BCDC: (below main) (in /lib64/libc-2.12.so)
>> 
>> lrmd seems not to clean up gio channels properly:

Hmmm.  I'll try and get most of these fixed today.

>> 
>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>> definitely lost in loss record 147 of 152
>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194D54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x402427: main (main.c:302)
>> ==1734==
>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are
>> definitely lost in loss record 148 of 152
>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>> ==1734==    by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696)
>> ==1734==    by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105)
>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194D54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x402427: main (main.c:302)
>> ==1734==
>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>> definitely lost in loss record 151 of 152
>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>> ==1734==    by 0x5067870: services_os_action_execute (services_linux.c:456)
>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194D54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x402427: main (main.c:302)
>> ==1734==
>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are
>> definitely lost in loss record 152 of 152
>> ==1734==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
>> ==1734==    by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x71C67F4: g_io_channel_unix_new (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x4E52470: mainloop_add_fd (mainloop.c:660)
>> ==1734==    by 0x50678AE: services_os_action_execute (services_linux.c:465)
>> ==1734==    by 0x50676B4: recurring_action_timer (services_linux.c:212)
>> ==1734==    by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7190F0D: g_main_context_dispatch (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x7194D54: g_main_loop_run (in
>> /lib64/libglib-2.0.so.0.2200.5)
>> ==1734==    by 0x402427: main (main.c:302)
>> 
>> 
>>> 
>>> The only one I'm really concerned about is the lrmd.
>>> 
>>>> 
>>>> Should I make full cluster restart or rolling one is ok?
>>> 
>>> To have the sysconfig values take effect?  Either.
>> 
>> To correctly find all possible leaks which depend on execution path.
>> 
>>> 
>>>> 
>>>>> 
>>>>> first to rule out glib's funky allocator.
>>>>> 
>>>>>> 
>>>>>> I can send CIB contents if needed.
>>>>>> 
>>>>>> Vladislav
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org