[ClusterLabs] Pacemaker failed to restart subprocess of host if container also uses pacemaker cluster!

Fri Nov 16 03:33:09 EST 2018

There is a problem in my program about pacemake that pacemaker  failed to restart subprocess of host  if container also uses pacemaker cluster!

The environment is as follows:
1. corosync version  2.4.0    pacemaker version 1.1.16
2. three node clusters, and container also has a pacemaker cluster
This issue caused the cluster can`t work normally when the node is restart or the pacemakerd process is restart . 
I did a test for it: stop corosync ( leading to pacemaker restart)  ,the logs are as follows:
///////stop corosync//////
[ubuntu at paas-controller-208-1-0-40:~]$ sudo su
[root at paas-controller-208-1-0-40:/home/ubuntu]$ service  corosync stop
[root at paas-controller-208-1-0-40:/home/ubuntu]$ ps -elf | grep pacemaker
4 S root 16613 14434 0 80 0 - 26569 poll_s 19:09 pts/2 00:00:00 /usr/sbin/pacemakerd
4 S haclust+ 16619 16613 0 80 0 - 27481 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/cib
4 S root 16620 16613 0 80 0 - 27454 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/stonithd
4 S root 16622 16613 0 80 0 - 19155 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/lrmd
4 S haclust+ 16623 16613 0 80 0 - 25141 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/attrd
4 S haclust+ 16624 16613 0 80 0 - 20618 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/pengine
4 S haclust+ 16625 16613 0 80 0 - 29743 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/crmd
4 S root 16628 14465 0 80 0 - 26569 poll_s 19:09 pts/3 00:00:00 /usr/sbin/pacemakerd
4 S haclust+ 16631 16628 0 80 0 - 27357 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/cib
4 S root 16632 16628 0 80 0 - 27455 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/stonithd
4 S root 16633 16628 0 80 0 - 19155 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/lrmd
4 S haclust+ 16634 16628 0 80 0 - 25142 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/attrd
4 S haclust+ 16635 16628 0 80 0 - 20618 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/pengine
4 S haclust+ 16636 16628 0 80 0 - 29743 poll_s 19:09 ? 00:00:00 /usr/libexec/pacemaker/crmd
4 S root 23559 1 0 80 0 - 20416 hrtime 19:10 ? 00:00:00 /usr/sbin/pacemakerd -f
4 S root 25105 11245 0 80 0 - 28203 pipe_w 19:10 pts/5 00:00:00 grep --color=auto pacemaker
4 S root 31529 1 0 80 0 - 19012 poll_s 14:41 ? 00:00:40 /usr/libexec/pacemaker/lrmd
4 S haclust+ 31531 1 0 80 0 - 24467 poll_s 14:41 ? 00:00:29 /usr/libexec/pacemaker/pengine

some pacemaker process(crmd,attrd,cib,stonithd) seems to be lost, even if I restart pacemaker(service pacemaker start) .
Does anyone know how to deal it? Thank you very much! 

马金峰

通信协议软件开发工程师 

虚拟化二部/无线研究院/无线产品经营部 NIV Dept. II/Wireless Product R＆D Institute/Wireless Product Operation Division

中兴通讯股份有限公司

上海市浦东新区碧波路889号中兴通讯D2070

T: +86 021 xxxxxxxx      M: +86 17601320963

E: ma.jinfeng at zte.com.cn

www.zte.com.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181116/fb48ee26/attachment.html>