[Pacemaker] pacemaker processes RSS growth

Thu Sep 6 05:39:15 EDT 2012

On Thu, Sep 6, 2012 at 5:33 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 06.09.2012 10:19, Andrew Beekhof wrote:
>> On Thu, Sep 6, 2012 at 5:14 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> Hi,
>>>
>>> I noticed that some pacemaker processes grow during operation (commit
>>> 8535316). Running on top of corosync 2.0.1.
>>> I notched RSS size (RES as htop reports) with interval of ~18 hours.
>>> First column is notched after ~1 hour of operation.
>>>
>>> Results are:
>>> pengine      23568       23780
>>> crmd         15592       17420
>>> cib          12356       12380
>>> lrmd          4396       14128
>>> stonithd      3812        3812
>>> attrd         3240        3244
>>> pacemakerd    3104        3104

What unit are these values?

>>>
>>> Cluster is totally static, except cluster-recheck-interval is set to
>>> 3min. No actions had been taken after the first notch.
>>>
>>> This make me think of some slow memory leaks in crmd and lrmd, but I
>>> can't say that for sure because of glib.
>>>
>>> I do not know if CTS or coverity cover this, so I can try to run under
>>> valgrind if somebody give me instructions how to do that.
>>
>> Check the bottom of /etc/sysconfig/pacemaker :-)
>
> Heh :)
>
>> Valgrind takes a heavy toll though... perhaps set
>
> I know. That is a test cluster, so it doesn't matter.
>
>>
>> export G_SLICE=always-malloc
>
> Yes, I know. That does not help to deal with one-time glib
> initializations though, like gtype one.

Right, but they should be accounted for already in the first column.

> May be you have appropriate
> .supp for all daemons? I see only cts.supp, ptest.supp and cli.supp.

The PE shouldn't be leaking at all.
Every 5th commit to master runs the PE and cli (which also exercises
most of the cib) regression tests with valgrind and it hasn't reported
anything.
Stonithd appears to be in the clear too.

The only one I'm really concerned about is the lrmd.

>
> Should I make full cluster restart or rolling one is ok?

To have the sysconfig values take effect?  Either.

>
>>
>> first to rule out glib's funky allocator.
>>
>>>
>>> I can send CIB contents if needed.
>>>
>>> Vladislav
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org