[ClusterLabs] crm_mon memory leak

Jan Pokorný jpokorny at redhat.com
Wed Nov 18 19:37:42 UTC 2015


On 09/11/15 13:11 +0000, Karthikeyan Ramasamy wrote:
> root     13405     1  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     13566 13405  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     13623 13566  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     13758 13566  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     13784 13623  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14146 13566  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14167 13623  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14193 13784  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14284 13758  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14381 13784  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14469 14284  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14589 13405  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14837 14381  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14860 13566  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     14977 14589  0 13:42 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     19816 14167  0 13:43 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> root     19845 19816  0 13:43 ?        00:00:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h /tmp/ClusterMon_SNMP_10.64.109.36.html
> 
> From the above it looks that one crm_mon spawns another crm_mon processes and keeps building.

Yep, see the attached PID scheme.

My guess is that the script PCSESA.sh is in fact an accidental "soft" fork
bomb that could be reduced to something like this t.sh script:

echo -e '#!/bin/sh\nwhile true; do sleep 15; (eval "$0" "$@" &); done' > t.sh
chmod +x t.sh
./t.sh --foo bar

What puzzles me, though, is that the same PID file used in nested
execution is not preventing this sort of recursion, and I am wondering
if "open(..., | O_SYNC)" or explicit fsync after write would be of
any help here (smells like filesystem-level race condition).

> Can you please let us know if there is anything else we have to
> check or still there could be issues with the script?

Karthik, would you be able to provide somewhat reduced version of
PCSESA.sh (as requested by Ken) that still reproduces the issue?

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc_tree.svg
Type: image/svg+xml
Size: 10646 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151118/084402bb/attachment-0004.svg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151118/084402bb/attachment-0004.sig>


More information about the Users mailing list