[ClusterLabs] Upgrade corosync problem

Tue Jun 26 10:27:28 UTC 2018

On 26/06/18 11:24, Salvatore D'angelo wrote:
> Hi,
> 
> I have tried with:
> 0.16.0.real-1ubuntu4
> 0.16.0.real-1ubuntu5
> 
> which version should I try?

Hmm both of those are actually quite old! maybe a newer one?

Chrissie

> 
>> On 26 Jun 2018, at 12:03, Christine Caulfield <ccaulfie at redhat.com
>> <mailto:ccaulfie at redhat.com>> wrote:
>>
>> On 26/06/18 11:00, Salvatore D'angelo wrote:
>>> Consider that the container is the same when corosync 2.3.5 run.
>>> If it is something related to the container probably the 2.4.4
>>> introduced a feature that has an impact on container.
>>> Should be something related to libqb according to the code.
>>> Anyone can help?
>>>
>>
>>
>> Have you tried downgrading libqb to the previous version to see if it
>> still happens?
>>
>> Chrissie
>>
>>>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaulfie at redhat.com
>>>> <mailto:ccaulfie at redhat.com>
>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>
>>>> On 26/06/18 10:35, Salvatore D'angelo wrote:
>>>>> Sorry after the command:
>>>>>
>>>>> corosync-quorumtool -ps
>>>>>
>>>>> the error in log are still visible. Looking at the source code it seems
>>>>> problem is at this line:
>>>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c
>>>>>
>>>>>     if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {
>>>>> fprintf(stderr, "Cannot initialize QUORUM service\n");
>>>>> q_handle = 0;
>>>>> goto out;
>>>>> }
>>>>>
>>>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {
>>>>> fprintf(stderr, "Cannot initialise CFG service\n");
>>>>> c_handle = 0;
>>>>> goto out;
>>>>> }
>>>>>
>>>>> The quorum_initialize function is defined here:
>>>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c
>>>>>
>>>>> It seems interacts with libqb to allocate space on /dev/shm but
>>>>> something fails. I tried to update the libqb with apt-get install
>>>>> but no
>>>>> success.
>>>>>
>>>>> The same for second function:
>>>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c
>>>>>
>>>>> Now I am not an expert of libqb. I have the
>>>>> version 0.16.0.real-1ubuntu5.
>>>>>
>>>>> The folder /dev/shm has 777 permission like other nodes with older
>>>>> corosync and pacemaker that work fine. The only difference is that I
>>>>> only see files created by root, no one created by hacluster like other
>>>>> two nodes (probably because pacemaker didn’t start correctly).
>>>>>
>>>>> This is the analysis I have done so far.
>>>>> Any suggestion?
>>>>>
>>>>>
>>>>
>>>> Hmm. t seems very likely something to do with the way the container is
>>>> set up then - and I know nothing about containers. Sorry :/
>>>>
>>>> Can anyone else help here?
>>>>
>>>> Chrissie
>>>>
>>>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo
>>>>>> <sasadangelo at gmail.com <mailto:sasadangelo at gmail.com>
>>>>>> <mailto:sasadangelo at gmail.com>
>>>>>> <mailto:sasadangelo at gmail.com>> wrote:
>>>>>>
>>>>>> Yes, sorry you’re right I could find it by myself.
>>>>>> However, I did the following:
>>>>>>
>>>>>> 1. Added the line you suggested to /etc/fstab
>>>>>> 2. mount -o remount /dev/shm
>>>>>> 3. Now I correctly see /dev/shm of 512M with df -h
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> overlay          63G   11G   49G  19% /
>>>>>> tmpfs            64M  4.0K   64M   1% /dev
>>>>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>>>>> osxfs           466G  158G  305G  35% /Users
>>>>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>>>>> *shm             512M   15M  498M   3% /dev/shm*
>>>>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>>>>> tmpfs           128M     0  128M   0% /tmp
>>>>>>
>>>>>> The errors in log went away. Consider that I remove the log file
>>>>>> before start corosync so it does not contains lines of previous
>>>>>> executions.
>>>>>> <corosync.log>
>>>>>>
>>>>>> But the command:
>>>>>> corosync-quorumtool -ps
>>>>>>
>>>>>> still give:
>>>>>> Cannot initialize QUORUM service
>>>>>>
>>>>>> Consider that few minutes before it gave me the message:
>>>>>> Cannot initialize CFG service
>>>>>>
>>>>>> I do not know the differences between CFG and QUORUM in this case.
>>>>>>
>>>>>> If I try to start pacemaker the service is OK but I see only pacemaker
>>>>>> and the Transport does not work if I try to run a cam command.
>>>>>> Any suggestion?
>>>>>>
>>>>>>
>>>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield
>>>>>>> <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>>>>> <mailto:ccaulfie at redhat.com>
>>>>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>>>>
>>>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Yes,
>>>>>>>>
>>>>>>>> I am reproducing only the required part for test. I think the
>>>>>>>> original
>>>>>>>> system has a larger shm. The problem is that I do not know
>>>>>>>> exactly how
>>>>>>>> to change it.
>>>>>>>> I tried the following steps, but I have the impression I didn’t
>>>>>>>> performed the right one:
>>>>>>>>
>>>>>>>> 1. remove everything under /tmp
>>>>>>>> 2. Added the following line to /etc/fstab
>>>>>>>> tmpfs   /tmp         tmpfs  
>>>>>>>> defaults,nodev,nosuid,mode=1777,size=128M 
>>>>>>>>         0  0
>>>>>>>> 3. mount /tmp
>>>>>>>> 4. df -h
>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>> overlay          63G   11G   49G  19% /
>>>>>>>> tmpfs            64M  4.0K   64M   1% /dev
>>>>>>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>>>>>>> osxfs           466G  158G  305G  35% /Users
>>>>>>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>>>>>>> shm              64M   11M   54M  16% /dev/shm
>>>>>>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>>>>>>> *tmpfs           128M     0  128M   0% /tmp*
>>>>>>>>
>>>>>>>> The errors are exactly the same.
>>>>>>>> I have the impression that I changed the wrong parameter. Probably I
>>>>>>>> have to change:
>>>>>>>> shm              64M   11M   54M  16% /dev/shm
>>>>>>>>
>>>>>>>> but I do not know how to do that. Any suggestion?
>>>>>>>>
>>>>>>>
>>>>>>> According to google, you just add a new line to /etc/fstab for
>>>>>>> /dev/shm
>>>>>>>
>>>>>>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>>>>>>>
>>>>>>> Chrissie
>>>>>>>
>>>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield
>>>>>>>>> <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>>>>>>> <mailto:ccaulfie at redhat.com>
>>>>>>>>> <mailto:ccaulfie at redhat.com>
>>>>>>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>>>>>>
>>>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Let me add here one important detail. I use Docker for my test
>>>>>>>>>> with 5
>>>>>>>>>> containers deployed on my Mac.
>>>>>>>>>> Basically the team that worked on this project installed the
>>>>>>>>>> cluster
>>>>>>>>>> on soft layer bare metal.
>>>>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>>>>>>>> occurred recreate the cluster from scratch is not easy.
>>>>>>>>>> Test it was a cumbersome if you consider that we access to the
>>>>>>>>>> machines with a complex system hard to describe here.
>>>>>>>>>> For this reason I ported the cluster on Docker for test purpose.
>>>>>>>>>> I am
>>>>>>>>>> not interested to have it working for months, I just need a
>>>>>>>>>> proof of
>>>>>>>>>> concept. 
>>>>>>>>>>
>>>>>>>>>> When the migration works I’ll port everything on bare metal
>>>>>>>>>> where the
>>>>>>>>>> size of resources are ambundant.  
>>>>>>>>>>
>>>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me
>>>>>>>>>> what
>>>>>>>>>> should be an acceptable size for several days of running it is ok
>>>>>>>>>> for me.
>>>>>>>>>> It is ok also have commands to clean the shm when required.
>>>>>>>>>> I know I can find them on Google but if you can suggest me these
>>>>>>>>>> info
>>>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would
>>>>>>>>>> like to
>>>>>>>>>> avoid days of guesswork and try and error if possible.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if
>>>>>>>>> you can
>>>>>>>>> spare it. My 'standard' system uses 75MB under normal running
>>>>>>>>> allowing
>>>>>>>>> for one command-line query to run.
>>>>>>>>>
>>>>>>>>> If I read this right then you're reproducing a bare-metal system in
>>>>>>>>> containers now? so the original systems will have a default
>>>>>>>>> /dev/shm
>>>>>>>>> size which is probably much larger than your containers?
>>>>>>>>>
>>>>>>>>> I'm just checking here that we don't have a regression in memory
>>>>>>>>> usage
>>>>>>>>> as Poki suggested.
>>>>>>>>>
>>>>>>>>> Chrissie
>>>>>>>>>
>>>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com
>>>>>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>>>>>> <mailto:jpokorny at redhat.com>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>>>>>>>> Thanks for reply. I scratched my cluster and created it
>>>>>>>>>>>> again and
>>>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>>>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>>>>>>>>
>>>>>>>>>>>> then I installed new packages. The problem is the same, when
>>>>>>>>>>>> I launch:
>>>>>>>>>>>> corosync-quorumtool -ps
>>>>>>>>>>>>
>>>>>>>>>>>> I got: Cannot initialize QUORUM service
>>>>>>>>>>>>
>>>>>>>>>>>> Here the log with debug enabled:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create
>>>>>>>>>>>> circular mmap
>>>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>>>>>>>> unavailable (11)
>>>>>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>>>>>>>>> Resource temporarily unavailable (11)
>>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>>>>>>>>
>>>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>>>>>>>> commands, however:
>>>>>>>>>>>>
>>>>>>>>>>>> df -h /dev/shm
>>>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>>>>>>>>
>>>>>>>>>>>> ls /dev/shm
>>>>>>>>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>>>>>>>>    qb-quorum-request-18020-18095-32-data
>>>>>>>>>>>> qb-cmap-request-18020-18036-25-header
>>>>>>>>>>>>  qb-corosync-blackbox-header
>>>>>>>>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>>>>>>>>
>>>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>>>>>>>> corosync release?
>>>>>>>>>>>
>>>>>>>>>>> For a start, can you try configuring corosync with
>>>>>>>>>>> --enable-small-memory-footprint switch?
>>>>>>>>>>>
>>>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>>>>>>>> opposite of generous (per today's standards), but may be the
>>>>>>>>>>> result
>>>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>>>>>>>> the above build-time toggle might help.
>>>>>>>>>>>
>>>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>>>>>>>> likely your best bet (I don't recommended fiddling with
>>>>>>>>>>> mlockall()
>>>>>>>>>>> and similar measures in corosync).
>>>>>>>>>>>
>>>>>>>>>>> Of course, feel free to raise a regression if you have a
>>>>>>>>>>> reproducible
>>>>>>>>>>> comparison between two corosync (plus possibly different
>>>>>>>>>>> libraries
>>>>>>>>>>> like libqb) versions, one that works and one that won't, in
>>>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image,
>>>>>>>>>>> etc.).
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> Jan (Poki)
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>>>> Getting
>>>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>>>>>>> Getting
>>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>>>>>> Getting
>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>> Getting
>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> <http://www.clusterlabs.org/>
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>