[ClusterLabs] [Problem and Question] If there are too many resources, pacemaker-controld restarts when re-Probe is executed.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu May 17 16:45:34 EDT 2018
Hi All,
I have built the following environment.
* RHEL7.3 at KVM
* libqb-1.0.2
* corosync 2.4.4
* pacemaker 2.0-rc4
Start up the cluster and pour crm files with 180 Dummy resources.
Node 3 will not start.
--------------
[root at rh73-01 ~]# crm_mon -1
Stack: corosync
Current DC: rh73-01 (version 2.0.0-3aa2fced22) - partition with quorum
Last updated: Thu May 17 18:44:39 2018
Last change: Thu May 17 18:44:18 2018 by root via cibadmin on rh73-01
2 nodes configured
180 resources configured
Online: [ rh73-01 rh73-02 ]
Active resources:
Resource Group: grpJOS1
prmDummy1 (ocf::pacemaker:Dummy): Started rh73-01
(snip)
prmDummy140 (ocf::pacemaker:Dummy): Started rh73-01
(snip)
prmDummy160 (ocf::pacemaker:Dummy): Started rh73-02
--------------
Execute crm_resource -R after 120 resources are started on the clustern.
--------------
[root at rh73-01 ~]# crm_resource -R
Waiting for 1 replies from the controller. OK
--------------
I tried the following 3 patterns.
*******************
Pattern 1) When /etc/sysconfig/pacemaker is set as follows.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
--------------
After a while, the DC node crmd fails to recover and restarts the difference.
[root at rh73-01 ~]# ps -ef |grep pace
root 6751 1 0 18:43 ? 00:00:00 /usr/sbin/pacemakerd -f
haclust+ 6752 6751 2 18:43 ? 00:00:16 /usr/libexec/pacemaker/pacemaker-based
root 6753 6751 0 18:43 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced
root 6754 6751 0 18:43 ? 00:00:02 /usr/libexec/pacemaker/pacemaker-execd
haclust+ 6755 6751 0 18:43 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd
haclust+ 6756 6751 0 18:43 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd
haclust+ 20478 6751 0 18:50 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-controld
root 25552 1302 0 18:52 pts/0 00:00:00 grep --color=auto pace
Pattern 2) In order to avoid problems, I made the following settings.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
PCMK_cib_timeout=120
PCMK_ipc_buffer=262144
-------------- at crm file.
(snip)
property cib-bootstrap-options: \ cluster-ipc-limit=2000 \
(snip)
--------------
Just like pattern 1, after a while, DC node crmd fails to recover and restarts the difference.
[root at rh73-01 ~]# ps -ef | grep pace
root 3840 1 0 18:57 ? 00:00:00 /usr/sbin/pacemakerd -f
haclust+ 3841 3840 3 18:57 ? 00:00:16 /usr/libexec/pacemaker/pacemaker-based
root 3842 3840 0 18:57 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced
root 3843 3840 0 18:57 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-execd
haclust+ 3844 3840 0 18:57 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd
haclust+ 3845 3840 0 18:57 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd
haclust+ 6221 3840 0 19:00 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-controld
root 17974 1302 0 19:05 pts/0 00:00:00 grep --color=auto pace
Pattern 3) In order to avoid problems, I made the following settings. I tried to make only the value of PCMK_ipc_buffer smaller than the default.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
PCMK_ipc_buffer=20480
--------------
Even after a while, crmd will not restart and the resources of the cluster will be configured.
[root at rh73-01 ~]# ps -ef | grep pace
root 23511 1 0 19:08 ? 00:00:00 /usr/sbin/pacemakerd -f
haclust+ 23512 23511 16 19:08 ? 00:00:19 /usr/libexec/pacemaker/pacemaker-based
root 23513 23511 0 19:08 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced
root 23514 23511 0 19:08 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-execd
haclust+ 23515 23511 0 19:08 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd
haclust+ 23516 23511 3 19:08 ? 00:00:04 /usr/libexec/pacemaker/pacemaker-schedulerd
haclust+ 23517 23511 11 19:08 ? 00:00:13 /usr/libexec/pacemaker/pacemaker-controld
root 28430 1302 0 19:10 pts/0 00:00:00 grep --color=auto pace
*******************
This problem seems to happen with Pacemaker-1.1.18. If PCMK_fail_fast = yes, restarting this crmd will cause the node to reboot.
If PCMK_ipc_buffer is made small, crmd will not restart properly.
If it gets bigger it will restart, it may be something wrong with Pacemaker.
Is not there something wrong with pacemaker?
If the number of resources is large, what kind of setting is correct?
* This content is registered in the following Bugzilla.
- https://bugs.clusterlabs.org/show_bug.cgi?id=5349
Best Regards,
Hideo Yamauchi.
More information about the Users
mailing list