[Pacemaker] Pacemaker still may include memory leaks

Vladislav Bogdanov bubble at hoster-ok.com
Wed May 29 04:19:58 EDT 2013


29.05.2013 11:01, Andrew Beekhof wrote:
> 
> On 28/05/2013, at 4:30 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>>
>> On 28/05/2013, at 10:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>>
>>> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>
>>>> 27.05.2013 04:20, Yuichi SEINO wrote:
>>>>> Hi,
>>>>>
>>>>> 2013/5/24 Vladislav Bogdanov <bubble at hoster-ok.com>:
>>>>>> 24.05.2013 06:34, Andrew Beekhof wrote:
>>>>>>> Any help figuring out where the leaks might be would be very much appreciated :)
>>>>>>
>>>>>> One (and the only) suspect is unfortunately crmd itself.
>>>>>> It has private heap grown from 2708 to 3680 kB.
>>>>>>
>>>>>> All other relevant differences are in qb shm buffers, which are
>>>>>> controlled and may grow until they reach configured size.
>>>>>>
>>>>>> @Yuichi
>>>>>> I would recommend to try running under valgrind on a testing cluster to
>>>>>> figure out is that a memleak (lost memory) or some history data
>>>>>> (referenced memory). Latter may be a logical memleak though. You may
>>>>>> look in /etc/sysconfig/pacemaker for details.
>>>>>
>>>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node
>>>>> and SBY node.
>>>>
>>>>
>>>> I do not see any "direct" memory leaks (repeating 'definitely-lost'
>>>> allocations) there.
>>>>
>>>> So what we see is probably one of:
>>>> * Cache/history/etc, which grows up to some limit (or expired at the
>>>> some point in time).
>>>> * Unlimited/not-expirable lists/hashes of data structures, which are
>>>> correctly freed at exit
>>>
>>> There is still plenty of memory chunks not free'd at exit, I'm slowly working through those.
>>
>> I've pushed the following to my repo:
>>
>> + Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions 
>> + Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is consistently free'd 
>> + Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit 
>> + Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop before doing anything to help valgrind 
>> + Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary detail from shutdown message 
>> + Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop structures at exit 
>> + Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference GSource inputs 
>> + Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory before exit 
>> + Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional valgrind errors 
>>
>> If someone would like to run the cluster (no valgrind needed) for a while with
>>
>> export PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete
>>
>> and then (after grabbing smaps) shut it down, we should have some information about any lists/hashes that are growing too large.
>>
>> Also, be sure to run with:
>>
>> export G_SLICE=always-malloc
>>
>> which will prevent glib from accumulating pools of memory and distorting any results.
> 
> 
> I did this today with 2747e25 and it looks to me like there is no leak (anymore?)
> For context, between smaps.5 and smaps.6, the 4 node cluster ran over 120 "standby" tests (lots of PE runs and resource activity).
> So unless someone can show me otherwise, I'm going to move on :)

I would say I'm convinced ;)
I'd bet that is because of 0947721, glib programming is not always
intuitive (you should remember that bug with IOwatches).
And GSources are probably destroyed when you exit mainloop, that's why
we do not see that in valgrind.
Hopefully mainloop/gio code is now stable as a rock.

Is this DC or ordinary member btw?

> 
> Note that the [heap] changes are actually the memory usage going _backwards_.
> 
> Raw results below.
> 
> [root at corosync-host-1 ~]# cat /proc/`pidof crmd`/smaps  > smaps.6 ; diff -u smaps.5 smaps.6;
> --- smaps.5	2013-05-29 02:39:25.032940230 -0400
> +++ smaps.6	2013-05-29 03:48:51.278940819 -0400
> @@ -40,16 +40,16 @@
>  Swap:                  0 kB
>  KernelPageSize:        4 kB
>  MMUPageSize:           4 kB
> -0226b000-02517000 rw-p 00000000 00:00 0                                  [heap]
> -Size:               2736 kB
> -Rss:                2268 kB
> -Pss:                2268 kB
> +0226b000-02509000 rw-p 00000000 00:00 0                                  [heap]
> +Size:               2680 kB
> +Rss:                2212 kB
> +Pss:                2212 kB
>  Shared_Clean:          0 kB
>  Shared_Dirty:          0 kB
>  Private_Clean:         0 kB
> -Private_Dirty:      2268 kB
> -Referenced:         2268 kB
> -Anonymous:          2268 kB
> +Private_Dirty:      2212 kB
> +Referenced:         2212 kB
> +Anonymous:          2212 kB
>  AnonHugePages:         0 kB
>  Swap:                  0 kB
>  KernelPageSize:        4 kB
> @@ -112,13 +112,13 @@
>  MMUPageSize:           4 kB
>  7f0c6e918000-7f0c6ee18000 rw-s 00000000 00:10 522579                     /dev/shm/qb-pengine-event-27411-27412-6-data
>  Size:               5120 kB
> -Rss:                3572 kB
> -Pss:                1785 kB
> +Rss:                4936 kB
> +Pss:                2467 kB
>  Shared_Clean:          0 kB
> -Shared_Dirty:       3572 kB
> +Shared_Dirty:       4936 kB
>  Private_Clean:         0 kB
>  Private_Dirty:         0 kB
> -Referenced:         3572 kB
> +Referenced:         4936 kB
>  Anonymous:             0 kB
>  AnonHugePages:         0 kB
>  Swap:                  0 kB
> @@ -841,7 +841,7 @@
>  7f0c72b00000-7f0c72b1d000 r-xp 00000000 fd:00 119                        /lib64/libselinux.so.1
>  Size:                116 kB
>  Rss:                  36 kB
> -Pss:                   5 kB
> +Pss:                   4 kB
>  Shared_Clean:         36 kB
>  Shared_Dirty:          0 kB
>  Private_Clean:         0 kB
> @@ -1401,7 +1401,7 @@
>  7f0c740c6000-7f0c74250000 r-xp 00000000 fd:00 45                         /lib64/libc-2.12.so
>  Size:               1576 kB
>  Rss:                 588 kB
> -Pss:                  20 kB
> +Pss:                  19 kB
>  Shared_Clean:        588 kB
>  Shared_Dirty:          0 kB
>  Private_Clean:         0 kB
> 
> 
>>
>>
>>> Once we know all memory is being cleaned up, the next step is to check the size of things beforehand.
>>>
>>> I'm hoping one or more of them show up as unnaturally large, indicating things are being added but not removed.
>>>
>>>> (f.e like dlm_controld has(had???) for a
>>>> debugging buffer or like glibc resolver had in EL3). This cannot be
>>>> caught with valgrind if you use it in a standard way.
>>>>
>>>> I believe we have former one. To prove that, it would be very
>>>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some
>>>> long enough (2-3 weeks) period of time and periodically get memory
>>>> allocation state from there (with 'monitor leak_check full reachable
>>>> any' gdb command). I wanted to do that a long time ago, but
>>>> unfortunately did not have enough spare time to even try that (although
>>>> I tried to valgrind other programs that way).
>>>>
>>>> This is described in valgrind documentation:
>>>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
>>>>
>>>> We probably do not need to specify '--vgdb-error=0' because we do not
>>>> need to install watchpoints at the start (and we do not need/want to
>>>> immediately connect to crmd with gdb to tell it to continue), we just
>>>> need to periodically get status of memory allocations
>>>> (stop-leak_check-cont sequence). Probably that should be done in a
>>>> 'fast' manner, so crmd does not stop for a long time, and the rest of
>>>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do
>>>> not know if it's even possible to do that with crmd.
>>>>
>>>> And, as pacemaker heavily utilizes glib, which has own memory allocator
>>>> (slices), it is better to switch it to a 'standard' malloc/free for
>>>> debugging with G_SLICE=always-malloc env var.
>>>>
>>>> Last, I did memleak checks for a 'static' (i.e. no operations except
>>>> monitors are performed) cluster for ~1.1.8, and did not find any. It
>>>> would be interesting to see if that is true for an 'active' one, which
>>>> starts/stops resources, handles failures, etc.
>>>>
>>>>>
>>>>> Sincerely,
>>>>> Yuichi
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
>>>>>>> I'm guessing 4096 bytes.
>>>>>>>
>>>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I retry the test after we updated packages to the latest tag and OS.
>>>>>>>> glue and booth is latest.
>>>>>>>>
>>>>>>>> * Environment
>>>>>>>> OS:RHEL 6.4
>>>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>>>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>>>>>>>> resource-agent:v3.9.5
>>>>>>>> libqb:v0.14.4
>>>>>>>> corosync:v2.3.0
>>>>>>>> pacemaker:v1.1.10-rc2
>>>>>>>> crmsh:v1.2.5
>>>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>>>>>>>
>>>>>>>> * Test procedure
>>>>>>>> we regularly switch a ticket. The previous test also used the same way.
>>>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before
>>>>>>>> pacemaker use libqb.
>>>>>>>>
>>>>>>>> * Result
>>>>>>>> As a result, I think that crmd may cause the memory leak.
>>>>>>>>
>>>>>>>> crmd smaps(a total of each addresses)
>>>>>>>> In detail, we attached smaps of  start and end. And, I recorded smaps
>>>>>>>> every 1 minutes.
>>>>>>>>
>>>>>>>> Start
>>>>>>>> RSS: 7396
>>>>>>>> SHR(Shared_Clean+Shared_Dirty):3560
>>>>>>>> Private(Private_Clean+Private_Dirty):3836
>>>>>>>>
>>>>>>>> Interbal(about 30h later)
>>>>>>>> RSS:18464
>>>>>>>> SHR:14276
>>>>>>>> Private:4188
>>>>>>>>
>>>>>>>> End(about 70h later)
>>>>>>>> RSS:19104
>>>>>>>> SHR:14336
>>>>>>>> Private:4768
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Yuichi
>>>>>>>>
>>>>>>>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I ran the test for about two days.
>>>>>>>>>
>>>>>>>>> Environment
>>>>>>>>>
>>>>>>>>> OS:RHEL 6.3
>>>>>>>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>>>>>>>> corosync-2.3.0
>>>>>>>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>>>>>>>> libqb- 0.14.4
>>>>>>>>>
>>>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>>>>>>>
>>>>>>>>> start-up
>>>>>>>>> crmd:5332
>>>>>>>>> lrmd:3625
>>>>>>>>>
>>>>>>>>> interval(about 30h later)
>>>>>>>>> crmd:7716
>>>>>>>>> lrmd:3744
>>>>>>>>>
>>>>>>>>> ending(about 60h later)
>>>>>>>>> crmd:8336
>>>>>>>>> lrmd:3780
>>>>>>>>>
>>>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Yuichi
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Yuichi SEINO
>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Yuichi SEINO
>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>> <smaps_log.tar.gz>_______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Yuichi SEINO
>>>>> METROSYSTEMS CORPORATION
>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list