[Pacemaker] Pacemaker still may include memory leaks

Fri May 24 06:29:22 UTC 2013

24.05.2013 06:34, Andrew Beekhof wrote:
> Any help figuring out where the leaks might be would be very much appreciated :)

One (and the only) suspect is unfortunately crmd itself.
It has private heap grown from 2708 to 3680 kB.

All other relevant differences are in qb shm buffers, which are
controlled and may grow until they reach configured size.

@Yuichi
I would recommend to try running under valgrind on a testing cluster to
figure out is that a memleak (lost memory) or some history data
(referenced memory). Latter may be a logical memleak though. You may
look in /etc/sysconfig/pacemaker for details.

> 
> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
> I'm guessing 4096 bytes.
> 
> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
> 
>> Hi,
>>
>> I retry the test after we updated packages to the latest tag and OS.
>> glue and booth is latest.
>>
>> * Environment
>> OS:RHEL 6.4
>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>> resource-agent:v3.9.5
>> libqb:v0.14.4
>> corosync:v2.3.0
>> pacemaker:v1.1.10-rc2
>> crmsh:v1.2.5
>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>
>> * Test procedure
>> we regularly switch a ticket. The previous test also used the same way.
>> And, There was no a memory leak when we tested pacemaker-1.1 before
>> pacemaker use libqb.
>>
>> * Result
>> As a result, I think that crmd may cause the memory leak.
>>
>> crmd smaps(a total of each addresses)
>> In detail, we attached smaps of  start and end. And, I recorded smaps
>> every 1 minutes.
>>
>> Start
>> RSS: 7396
>> SHR(Shared_Clean+Shared_Dirty):3560
>> Private(Private_Clean+Private_Dirty):3836
>>
>> Interbal(about 30h later)
>> RSS:18464
>> SHR:14276
>> Private:4188
>>
>> End(about 70h later)
>> RSS:19104
>> SHR:14336
>> Private:4768
>>
>> Sincerely,
>> Yuichi
>>
>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>> Hi,
>>>
>>> I ran the test for about two days.
>>>
>>> Environment
>>>
>>> OS:RHEL 6.3
>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>> corosync-2.3.0
>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>> libqb- 0.14.4
>>>
>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>
>>> start-up
>>> crmd:5332
>>> lrmd:3625
>>>
>>> interval(about 30h later)
>>> crmd:7716
>>> lrmd:3744
>>>
>>> ending(about 60h later)
>>> crmd:8336
>>> lrmd:3780
>>>
>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>
>>> Sincerely,
>>> Yuichi
>>>
>>> --
>>> Yuichi SEINO
>>> METROSYSTEMS CORPORATION
>>> E-mail:seino.cluster2 at gmail.com
>>
>>
>>
>> -- 
>> Yuichi SEINO
>> METROSYSTEMS CORPORATION
>> E-mail:seino.cluster2 at gmail.com
>> <smaps_log.tar.gz>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>