[ClusterLabs] Upgrade corosync problem

Salvatore D'angelo sasadangelo at gmail.com
Wed Jun 27 03:35:10 EDT 2018


Thanks for reply and detailed explaination. I am not using the —network=host option.
I have a docker image based on Ubuntu 14.04 where I only deploy this additional software:

	RUN apt-get update && apt-get install -y wget git xz-utils openssh-server \
		systemd-services make gcc pkg-config psmisc fuse libpython2.7 libopenipmi0 \
		libdbus-glib-1-2 libsnmp30 libtimedate-perl libpcap0.8

configure ssh with key pairs to communicate easily. The containers are created with these simple commands:

	docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device /dev/loop0 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish 		${PG1_SSH_PORT}:22 --ip ${PG1_PUBLIC_IP} --name ${PG1_PRIVATE_NAME} --hostname ${PG1_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash

	docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device /dev/loop1 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish ${PG2_SSH_PORT}:22 --ip ${PG2_PUBLIC_IP} --name ${PG2_PRIVATE_NAME} --hostname ${PG2_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash		

	docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device /dev/loop2 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish ${PG3_SSH_PORT}:22 --ip ${PG3_PUBLIC_IP} --name ${PG3_PRIVATE_NAME} --hostname ${PG3_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash

/dev/fuse is used to configure glusterfs on two others nodes and /dev/loopX just to simulate better my bare metal env.

One thing that I do not understand is that I tried to compare corosync 2.3.5 (the old version that worked fine) and 2.4.4 to understand differences but I haven’t found anything related to the piece of code that affects the issue. The quorum tool.c and cfg.c are almost the same. Probably the issue is somewhere else.

> On 27 Jun 2018, at 08:34, Jan Pokorný <jpokorny at redhat.com> wrote:
> On 26/06/18 17:56 +0200, Salvatore D'angelo wrote:
>> I did another test. I modified docker container in order to be able to run strace.
>> Running strace corosync-quorumtool -ps I got the following:
>> [snipped]
>> connect(5, {sa_family=AF_LOCAL, sun_path=@"cfg"}, 110) = 0
>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
>> sendto(5, "\377\377\377\377\0\0\0\0\30\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0", 24, MSG_NOSIGNAL, NULL, 0) = 24
>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
>> recvfrom(5, 0x7ffd73bd7ac0, 12328, 16640, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
>> poll([{fd=5, events=POLLIN}], 1, 4294967295) = 1 ([{fd=5, revents=POLLIN}])
>> recvfrom(5, "\377\377\377\377\0\0\0\0(0\0\0\0\0\0\0\365\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0"..., 12328, MSG_WAITALL|MSG_NOSIGNAL, NULL, NULL) = 12328
>> shutdown(5, SHUT_RDWR)                  = 0
>> close(5)                                = 0
>> write(2, "Cannot initialise CFG service\n", 30Cannot initialise CFG service) = 30
>> [snipped]
> This just demonstrated the effect of already detailed server-side
> error in the client, which communicates with the server just fine,
> but as soon as the server hits the mmap-based problem, it bails
> out the observed way, leaving client unsatisfied.
> Note one thing, abstract Unix sockets are being used for the
> communication like this (observe the first line in the strace
> output excerpt above), and if you happen to run container via
> a docker command with --network=host, you may also be affected with
> issues arising from abstract sockets not being isolated but rather
> sharing the same namespace.  At least that was the case some years
> back and what asked for a switch in underlying libqb library to
> use strictly the file-backed sockets, where the isolation
> semantics matches the intuition:
> https://lists.clusterlabs.org/pipermail/users/2017-May/013003.html
> + way to enable (presumably only for container environments, note
> that there's no per process straightforward granularity):
> https://clusterlabs.github.io/libqb/1.0.2/doxygen/qb_ipc_overview.html
> (scroll down to "IPC sockets (Linux only)")
> You may test that if you are using said --network=host switch.
>> I tried to understand what happen behind the scene but it is not easy for me.
>> Hoping someone on this list can help.
> Containers are tricky, just as Ansible (as shown earlier on the list)
> can be, when encumbered with false believes and/or misunderstandings.
> Virtual machines may serve better wrt. insights for the later bare
> metal deployments.
> -- 
> Jan (Poki)
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180627/a09f2cf4/attachment-0002.html>

More information about the Users mailing list