[Pacemaker] socket is incremented after running crm shell

Junko IKEDA tsukishima.ha at gmail.com
Fri Mar 30 02:07:37 EDT 2012


Hi,

I encountered the following error message during the run of crm shell.

crmd: [6837]: ERROR: socket_accept_connection: accept(sock=10): Too
many open files

The same error is discussed here,
https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2626
but it seems that this one is an issue for ubuntus and upstart,
and my case is probably the other one.

I can reproduce the similar case like this;
in this case, "bl460g6b" is DC.
and socket is incremented by one on only DC after running crm shell.


* Initial status

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:26:04 JST 2012
bl460g6a
42

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:26:16 JST 2012
bl460g6b
46



* Upload the resource setting(just start one Dummy RA)

# crm configure load update cib.crm
# crm_mon -1

============
Last updated: Fri Mar 30 14:26:52 2012
Stack: Heartbeat
Current DC: bl460g6b (22222222-2222-2222-2222-222222222222) -
partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ bl460g6a bl460g6b ]

dummy01 (ocf::pacemaker:Dummy): Started bl460g6a

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:27:10 JST 2012
bl460g6a
42

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:27:16 JST 2012
bl460g6b
47 <==== +1



* Adding the next resource

# crm configure primitive dummy02 ocf:pacemaker:Dummy

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:32:54 JST 2012
bl460g6a
42

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:32:58 JST 2012
bl460g6b
48 <==== +1



* Stop the second resource

# crm resource stop dummy02

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:34:05 JST 2012
bl460g6a
42

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:34:00 JST 2012
bl460g6b
48 <==== +0 ???



* Delete the next resource

# crm configure delete dummy02

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:35:31 JST 2012
bl460g6a
42

# date; hostname; lsof -p $(pgrep crmd) | wc -l
Fri Mar 30 14:35:15 JST 2012
bl460g6b
49  <==== +1


I attached hb_report.
By the way, I tried the above test with Pacemaker 1.0.12,
and Pacemaker 1.1.6 shows the same result.

Thanks,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report.tar.bz2
Type: application/x-bzip2
Size: 46251 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120330/74c7e636/attachment-0002.bz2>


More information about the Pacemaker mailing list