[ClusterLabs] Upgrade corosync problem

Mon Jul 2 15:19:43 UTC 2018

Hi All,

Today I tested the two suggestions you gave me. Here what I did. 
In the script where I create my 5 machines cluster (I use three nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs that we use for database backup and WAL files).

FIRST TEST
——————————
I added the —shm-size=512m to the “docker create” command. I noticed that as soon as I start it the shm size is 512m and I didn’t need to add the entry in /etc/fstab. However, I did it anyway:

tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0

and then
mount -o remount /dev/shm

Then I uninstalled all pieces of software (crmsh, resource agents, corosync and pacemaker) and installed the new one.
Started corosync and pacemaker but same problem occurred.

SECOND TEST
———————————
stopped corosync and pacemaker
uninstalled corosync
build corosync with --enable-small-memory-footprint and installed it
starte corosync and pacemaker

IT WORKED.

I would like to understand now why it didn’t worked in first test and why it worked in second. Which kind of memory is used too much here? /dev/shm seems not the problem, I allocated 512m on all three docker images (obviously on my single Mac) and enabled the container option as you suggested. Am I missing something here?

Now I want to use Docker for the moment only for test purpose so it could be ok to use the --enable-small-memory-footprint, but there is something I can do to have corosync working even without this option?

The reason I am asking this is that, in the future, it could be possible we deploy in production our cluster in containerised way (for the moment is just an idea). This will save a lot of time in developing, maintaining and deploying our patch system. All prerequisites and dependencies will be enclosed in container and if IT team will do some maintenance on bare metal (i.e. install new dependencies) it will not affects our containers. I do not see a lot of performance drawbacks in using container. The point is to understand if a containerised approach could save us lot of headache about maintenance of this cluster without affect performance too much. I am notice in Cloud environment this approach in a lot of contexts.

> On 2 Jul 2018, at 08:54, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 29/06/18 17:20, Jan Pokorný wrote:
>> On 29/06/18 10:00 +0100, Christine Caulfield wrote:
>>> On 27/06/18 08:35, Salvatore D'angelo wrote:
>>>> One thing that I do not understand is that I tried to compare corosync
>>>> 2.3.5 (the old version that worked fine) and 2.4.4 to understand
>>>> differences but I haven’t found anything related to the piece of code
>>>> that affects the issue. The quorum tool.c and cfg.c are almost the same.
>>>> Probably the issue is somewhere else.
>>>> 
>>> 
>>> This might be asking a bit much, but would it be possible to try this
>>> using Virtual Machines rather than Docker images? That would at least
>>> eliminate a lot of complex variables.
>> 
>> Salvatore, you can ignore the part below, try following the "--shm"
>> advice in other part of this thread.  Also the previous suggestion
>> to compile corosync with --small-memory-footprint may be of help,
>> but comes with other costs (expect lower throughput).
>> 
>> 
>> Chrissie, I have a plausible explanation and if it's true, then the
>> same will be reproduced wherever /dev/shm is small enough.
>> 
>> If I am right, then the offending commit is
>> https://github.com/corosync/corosync/commit/238e2e62d8b960e7c10bfa0a8281d78ec99f3a26
>> (present since 2.4.3), and while it arranges things for the better
>> in the context of prioritized, low jitter process, it all of
>> a sudden prevents as-you-need memory acquisition from the system,
>> meaning that the memory consumption constraints are checked immediately
>> when the memory is claimed (as it must fit into dedicated physical
>> memory in full).  Hence this impact we likely never realized may
>> be perceived as a sort of a regression.
>> 
>> Since we can calculate the approximate requirements statically, might
>> be worthy to add something like README.requirements, detailing how much
>> space will be occupied for typical configurations at minimum, e.g.:
>> 
>> - standard + --small-memory-footprint configuration
>> - 2 + 3 + X nodes (5?)
>> - without any service on top + teamed with qnetd + teamed with
>>  pacemaker atop (including just IPC channels between pacemaker
>>  daemons and corosync's CPG service, indeed)
>> 
> 
> That is possible explanation I suppose, yes.it <http://yes.it/>'s not something we can
> sensibly revert because it was already fixing another regression!
> 
> 
> I like the idea of documenting the /dev/shm requrements - that would
> certainly help with other people using containers - Salvatore mentioned
> earlier that there was nothing to guide him about the size needed. I'll
> raise an issue in github to cover it. Your input on how to do it for
> containers would also be helpful.
> 
> Chrissie
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180702/6b6047e1/attachment-0001.html>