[ClusterLabs] Upgrade corosync problem

Christine Caulfield ccaulfie at redhat.com
Fri Jun 29 09:00:42 UTC 2018


On 27/06/18 08:35, Salvatore D'angelo wrote:
> Hi,
> 
> Thanks for reply and detailed explaination. I am not using the
> —network=host option.
> I have a docker image based on Ubuntu 14.04 where I only deploy this
> additional software:
> 
> *RUN apt-get update && apt-get install -y wget git xz-utils
> openssh-server \*
> *systemd-services make gcc pkg-config psmisc fuse libpython2.7
> libopenipmi0 \*
> *libdbus-glib-1-2 libsnmp30 libtimedate-perl libpcap0.8*
> 
> configure ssh with key pairs to communicate easily. The containers are
> created with these simple commands:
> 
> *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device
> /dev/loop0 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish
> ${PG1_SSH_PORT}:22 --ip ${PG1_PUBLIC_IP} --name ${PG1_PRIVATE_NAME}
> --hostname ${PG1_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash*
> 
> *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device
> /dev/loop1 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish
> ${PG2_SSH_PORT}:22 --ip ${PG2_PUBLIC_IP} --name ${PG2_PRIVATE_NAME}
> --hostname ${PG2_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash*
> 
> *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device
> /dev/loop2 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish
> ${PG3_SSH_PORT}:22 --ip ${PG3_PUBLIC_IP} --name ${PG3_PRIVATE_NAME}
> --hostname ${PG3_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash*
> 
> /dev/fuse is used to configure glusterfs on two others nodes and
> /dev/loopX just to simulate better my bare metal env.
> 
> One thing that I do not understand is that I tried to compare corosync
> 2.3.5 (the old version that worked fine) and 2.4.4 to understand
> differences but I haven’t found anything related to the piece of code
> that affects the issue. The quorum tool.c and cfg.c are almost the same.
> Probably the issue is somewhere else.
> 

This might be asking a bit much, but would it be possible to try this
using Virtual Machines rather than Docker images? That would at least
eliminate a lot of complex variables.

Chrissie


> 
>> On 27 Jun 2018, at 08:34, Jan Pokorný <jpokorny at redhat.com
>> <mailto:jpokorny at redhat.com>> wrote:
>>
>> On 26/06/18 17:56 +0200, Salvatore D'angelo wrote:
>>> I did another test. I modified docker container in order to be able
>>> to run strace.
>>> Running strace corosync-quorumtool -ps I got the following:
>>
>>> [snipped]
>>> connect(5, {sa_family=AF_LOCAL, sun_path=@"cfg"}, 110) = 0
>>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
>>> sendto(5,
>>> "\377\377\377\377\0\0\0\0\30\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0", 24,
>>> MSG_NOSIGNAL, NULL, 0) = 24
>>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
>>> recvfrom(5, 0x7ffd73bd7ac0, 12328, 16640, 0, 0) = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> poll([{fd=5, events=POLLIN}], 1, 4294967295) = 1 ([{fd=5,
>>> revents=POLLIN}])
>>> recvfrom(5,
>>> "\377\377\377\377\0\0\0\0(0\0\0\0\0\0\0\365\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0"...,
>>> 12328, MSG_WAITALL|MSG_NOSIGNAL, NULL, NULL) = 12328
>>> shutdown(5, SHUT_RDWR)                  = 0
>>> close(5)                                = 0
>>> write(2, "Cannot initialise CFG service\n", 30Cannot initialise CFG
>>> service) = 30
>>> [snipped]
>>
>> This just demonstrated the effect of already detailed server-side
>> error in the client, which communicates with the server just fine,
>> but as soon as the server hits the mmap-based problem, it bails
>> out the observed way, leaving client unsatisfied.
>>
>> Note one thing, abstract Unix sockets are being used for the
>> communication like this (observe the first line in the strace
>> output excerpt above), and if you happen to run container via
>> a docker command with --network=host, you may also be affected with
>> issues arising from abstract sockets not being isolated but rather
>> sharing the same namespace.  At least that was the case some years
>> back and what asked for a switch in underlying libqb library to
>> use strictly the file-backed sockets, where the isolation
>> semantics matches the intuition:
>>
>> https://lists.clusterlabs.org/pipermail/users/2017-May/013003.html
>>
>> + way to enable (presumably only for container environments, note
>> that there's no per process straightforward granularity):
>>
>> https://clusterlabs.github.io/libqb/1.0.2/doxygen/qb_ipc_overview.html
>> (scroll down to "IPC sockets (Linux only)")
>>
>> You may test that if you are using said --network=host switch.
>>
>>> I tried to understand what happen behind the scene but it is not easy
>>> for me.
>>> Hoping someone on this list can help.
>>
>> Containers are tricky, just as Ansible (as shown earlier on the list)
>> can be, when encumbered with false believes and/or misunderstandings.
>> Virtual machines may serve better wrt. insights for the later bare
>> metal deployments.
>>
>> -- 
>> Jan (Poki)
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 



More information about the Users mailing list