[Pacemaker] Pacemaker still may include memory leaks

Tue May 28 02:30:29 EDT 2013

On 28/05/2013, at 10:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
>> 27.05.2013 04:20, Yuichi SEINO wrote:
>>> Hi,
>>> 
>>> 2013/5/24 Vladislav Bogdanov <bubble at hoster-ok.com>:
>>>> 24.05.2013 06:34, Andrew Beekhof wrote:
>>>>> Any help figuring out where the leaks might be would be very much appreciated :)
>>>> 
>>>> One (and the only) suspect is unfortunately crmd itself.
>>>> It has private heap grown from 2708 to 3680 kB.
>>>> 
>>>> All other relevant differences are in qb shm buffers, which are
>>>> controlled and may grow until they reach configured size.
>>>> 
>>>> @Yuichi
>>>> I would recommend to try running under valgrind on a testing cluster to
>>>> figure out is that a memleak (lost memory) or some history data
>>>> (referenced memory). Latter may be a logical memleak though. You may
>>>> look in /etc/sysconfig/pacemaker for details.
>>> 
>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node
>>> and SBY node.
>> 
>> 
>> I do not see any "direct" memory leaks (repeating 'definitely-lost'
>> allocations) there.
>> 
>> So what we see is probably one of:
>> * Cache/history/etc, which grows up to some limit (or expired at the
>> some point in time).
>> * Unlimited/not-expirable lists/hashes of data structures, which are
>> correctly freed at exit
> 
> There is still plenty of memory chunks not free'd at exit, I'm slowly working through those.

I've pushed the following to my repo:

+ Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions 
+ Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is consistently free'd 
+ Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit 
+ Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop before doing anything to help valgrind 
+ Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary detail from shutdown message 
+ Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop structures at exit 
+ Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference GSource inputs 
+ Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory before exit 
+ Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional valgrind errors 

If someone would like to run the cluster (no valgrind needed) for a while with

export PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete

and then (after grabbing smaps) shut it down, we should have some information about any lists/hashes that are growing too large.

Also, be sure to run with:

export G_SLICE=always-malloc

which will prevent glib from accumulating pools of memory and distorting any results.

> Once we know all memory is being cleaned up, the next step is to check the size of things beforehand.
> 
> I'm hoping one or more of them show up as unnaturally large, indicating things are being added but not removed.
> 
>> (f.e like dlm_controld has(had???) for a
>> debugging buffer or like glibc resolver had in EL3). This cannot be
>> caught with valgrind if you use it in a standard way.
>> 
>> I believe we have former one. To prove that, it would be very
>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some
>> long enough (2-3 weeks) period of time and periodically get memory
>> allocation state from there (with 'monitor leak_check full reachable
>> any' gdb command). I wanted to do that a long time ago, but
>> unfortunately did not have enough spare time to even try that (although
>> I tried to valgrind other programs that way).
>> 
>> This is described in valgrind documentation:
>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
>> 
>> We probably do not need to specify '--vgdb-error=0' because we do not
>> need to install watchpoints at the start (and we do not need/want to
>> immediately connect to crmd with gdb to tell it to continue), we just
>> need to periodically get status of memory allocations
>> (stop-leak_check-cont sequence). Probably that should be done in a
>> 'fast' manner, so crmd does not stop for a long time, and the rest of
>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do
>> not know if it's even possible to do that with crmd.
>> 
>> And, as pacemaker heavily utilizes glib, which has own memory allocator
>> (slices), it is better to switch it to a 'standard' malloc/free for
>> debugging with G_SLICE=always-malloc env var.
>> 
>> Last, I did memleak checks for a 'static' (i.e. no operations except
>> monitors are performed) cluster for ~1.1.8, and did not find any. It
>> would be interesting to see if that is true for an 'active' one, which
>> starts/stops resources, handles failures, etc.
>> 
>>> 
>>> Sincerely,
>>> Yuichi
>>> 
>>>> 
>>>>> 
>>>>> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
>>>>> I'm guessing 4096 bytes.
>>>>> 
>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I retry the test after we updated packages to the latest tag and OS.
>>>>>> glue and booth is latest.
>>>>>> 
>>>>>> * Environment
>>>>>> OS:RHEL 6.4
>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>>>>>> resource-agent:v3.9.5
>>>>>> libqb:v0.14.4
>>>>>> corosync:v2.3.0
>>>>>> pacemaker:v1.1.10-rc2
>>>>>> crmsh:v1.2.5
>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>>>>> 
>>>>>> * Test procedure
>>>>>> we regularly switch a ticket. The previous test also used the same way.
>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before
>>>>>> pacemaker use libqb.
>>>>>> 
>>>>>> * Result
>>>>>> As a result, I think that crmd may cause the memory leak.
>>>>>> 
>>>>>> crmd smaps(a total of each addresses)
>>>>>> In detail, we attached smaps of  start and end. And, I recorded smaps
>>>>>> every 1 minutes.
>>>>>> 
>>>>>> Start
>>>>>> RSS: 7396
>>>>>> SHR(Shared_Clean+Shared_Dirty):3560
>>>>>> Private(Private_Clean+Private_Dirty):3836
>>>>>> 
>>>>>> Interbal(about 30h later)
>>>>>> RSS:18464
>>>>>> SHR:14276
>>>>>> Private:4188
>>>>>> 
>>>>>> End(about 70h later)
>>>>>> RSS:19104
>>>>>> SHR:14336
>>>>>> Private:4768
>>>>>> 
>>>>>> Sincerely,
>>>>>> Yuichi
>>>>>> 
>>>>>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I ran the test for about two days.
>>>>>>> 
>>>>>>> Environment
>>>>>>> 
>>>>>>> OS:RHEL 6.3
>>>>>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>>>>>> corosync-2.3.0
>>>>>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>>>>>> libqb- 0.14.4
>>>>>>> 
>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>>>>> 
>>>>>>> start-up
>>>>>>> crmd:5332
>>>>>>> lrmd:3625
>>>>>>> 
>>>>>>> interval(about 30h later)
>>>>>>> crmd:7716
>>>>>>> lrmd:3744
>>>>>>> 
>>>>>>> ending(about 60h later)
>>>>>>> crmd:8336
>>>>>>> lrmd:3780
>>>>>>> 
>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>>>>> 
>>>>>>> Sincerely,
>>>>>>> Yuichi
>>>>>>> 
>>>>>>> --
>>>>>>> Yuichi SEINO
>>>>>>> METROSYSTEMS CORPORATION
>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Yuichi SEINO
>>>>>> METROSYSTEMS CORPORATION
>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>> <smaps_log.tar.gz>_______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> 
>>> --
>>> Yuichi SEINO
>>> METROSYSTEMS CORPORATION
>>> E-mail:seino.cluster2 at gmail.com
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>