[ClusterLabs] Pacemaker  failed to restart subprocess of host  if container also uses pacemaker cluster!

Ken Gaillot kgaillot at redhat.com
Tue Nov 20 13:59:13 EST 2018


On Fri, 2018-11-16 at 16:33 +0800, ma.jinfeng at zte.com.cn wrote:
> There is a problem in my program about pacemake that pacemaker
>  failed to restart subprocess of host  if container also uses
> pacemaker cluster!

That might not be supportable with the current code. It's possible to
have a nested cluster with VMs, but containers probably share too much
of the host environment. There was an issue not that long ago with
libqb that led to a new libqb option to use filesystem sockets instead
of Linux native sockets, that might help, but I wouldn't be surprised
if there are more issues.

One problem with nested clusters is fencing; it's difficult to get
fencing working reliably in both clusters.

If the reason for the separation is policy, then VMs may be the only
way. Otherwise, if you just want to control resources inside the
containers, then the new bundles feature or the Pacemaker Remote
feature would be the best way to handle it.

> The environment is as follows:
> 1. corosync version  2.4.0    pacemaker version 1.1.16
> 2. three node clusters, and container also has a pacemaker cluster
> This issue caused the cluster can`t work normally when the node is
> restart or the pacemakerd process is restart . 
> I did a test for it: stop corosync ( leading to pacemaker restart)
>  ,the logs are as follows:
> ///////stop corosync//////
> [ubuntu at paas-controller-208-1-0-40:~]$ sudo su
> [root at paas-controller-208-1-0-40:/home/ubuntu]$ service  corosync
> stop
> [root at paas-controller-208-1-0-40:/home/ubuntu]$ ps -elf | grep
> pacemaker
> 4 S root 16613 14434 0 80 0 - 26569 poll_s 19:09 pts/2 00:00:00
> /usr/sbin/pacemakerd
> 4 S haclust+ 16619 16613 0 80 0 - 27481 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/cib
> 4 S root 16620 16613 0 80 0 - 27454 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/stonithd
> 4 S root 16622 16613 0 80 0 - 19155 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/lrmd
> 4 S haclust+ 16623 16613 0 80 0 - 25141 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/attrd
> 4 S haclust+ 16624 16613 0 80 0 - 20618 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/pengine
> 4 S haclust+ 16625 16613 0 80 0 - 29743 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/crmd
> 4 S root 16628 14465 0 80 0 - 26569 poll_s 19:09 pts/3 00:00:00
> /usr/sbin/pacemakerd
> 4 S haclust+ 16631 16628 0 80 0 - 27357 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/cib
> 4 S root 16632 16628 0 80 0 - 27455 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/stonithd
> 4 S root 16633 16628 0 80 0 - 19155 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/lrmd
> 4 S haclust+ 16634 16628 0 80 0 - 25142 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/attrd
> 4 S haclust+ 16635 16628 0 80 0 - 20618 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/pengine
> 4 S haclust+ 16636 16628 0 80 0 - 29743 poll_s 19:09 ? 00:00:00
> /usr/libexec/pacemaker/crmd
> 4 S root 23559 1 0 80 0 - 20416 hrtime 19:10 ? 00:00:00
> /usr/sbin/pacemakerd -f
> 4 S root 25105 11245 0 80 0 - 28203 pipe_w 19:10 pts/5 00:00:00 grep
> --color=auto pacemaker
> 4 S root 31529 1 0 80 0 - 19012 poll_s 14:41 ? 00:00:40
> /usr/libexec/pacemaker/lrmd
> 4 S haclust+ 31531 1 0 80 0 - 24467 poll_s 14:41 ? 00:00:29
> /usr/libexec/pacemaker/pengine
> 
> some pacemaker process(crmd,attrd,cib,stonithd) seems to be lost,
> even if I restart pacemaker(service pacemaker start) .
> Does anyone know how to deal it? Thank you very much! 
> 
> 
> 
> 
> 
> 马金峰
> 通信协议软件开发工程师 
> 虚拟化二部/无线研究院/无线产品经营部 NIV Dept. II/Wireless Product R&D
> Institute/Wireless Product Operation Division
>  
> 中兴通讯股份有限公司
> 上海市浦东新区碧波路889号中兴通讯D2070
> T: +86 021 xxxxxxxx      M: +86 17601320963
> E: ma.jinfeng at zte.com.cn
> www.zte.com.cn
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list