[ClusterLabs] Upgrade corosync problem

Tue Jun 26 06:24:09 EDT 2018

Hi,

I have tried with:
0.16.0.real-1ubuntu4
0.16.0.real-1ubuntu5

which version should I try?

> On 26 Jun 2018, at 12:03, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 26/06/18 11:00, Salvatore D'angelo wrote:
>> Consider that the container is the same when corosync 2.3.5 run.
>> If it is something related to the container probably the 2.4.4
>> introduced a feature that has an impact on container.
>> Should be something related to libqb according to the code.
>> Anyone can help?
>> 
> 
> 
> Have you tried downgrading libqb to the previous version to see if it
> still happens?
> 
> Chrissie
> 
>>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaulfie at redhat.com
>>> <mailto:ccaulfie at redhat.com>> wrote:
>>> 
>>> On 26/06/18 10:35, Salvatore D'angelo wrote:
>>>> Sorry after the command:
>>>> 
>>>> corosync-quorumtool -ps
>>>> 
>>>> the error in log are still visible. Looking at the source code it seems
>>>> problem is at this line:
>>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c
>>>> 
>>>>     if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {
>>>> fprintf(stderr, "Cannot initialize QUORUM service\n");
>>>> q_handle = 0;
>>>> goto out;
>>>> }
>>>> 
>>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {
>>>> fprintf(stderr, "Cannot initialise CFG service\n");
>>>> c_handle = 0;
>>>> goto out;
>>>> }
>>>> 
>>>> The quorum_initialize function is defined here:
>>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c
>>>> 
>>>> It seems interacts with libqb to allocate space on /dev/shm but
>>>> something fails. I tried to update the libqb with apt-get install but no
>>>> success.
>>>> 
>>>> The same for second function:
>>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c
>>>> 
>>>> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5.
>>>> 
>>>> The folder /dev/shm has 777 permission like other nodes with older
>>>> corosync and pacemaker that work fine. The only difference is that I
>>>> only see files created by root, no one created by hacluster like other
>>>> two nodes (probably because pacemaker didn’t start correctly).
>>>> 
>>>> This is the analysis I have done so far.
>>>> Any suggestion?
>>>> 
>>>> 
>>> 
>>> Hmm. t seems very likely something to do with the way the container is
>>> set up then - and I know nothing about containers. Sorry :/
>>> 
>>> Can anyone else help here?
>>> 
>>> Chrissie
>>> 
>>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadangelo at gmail.com
>>>>> <mailto:sasadangelo at gmail.com>
>>>>> <mailto:sasadangelo at gmail.com>> wrote:
>>>>> 
>>>>> Yes, sorry you’re right I could find it by myself.
>>>>> However, I did the following:
>>>>> 
>>>>> 1. Added the line you suggested to /etc/fstab
>>>>> 2. mount -o remount /dev/shm
>>>>> 3. Now I correctly see /dev/shm of 512M with df -h
>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> overlay          63G   11G   49G  19% /
>>>>> tmpfs            64M  4.0K   64M   1% /dev
>>>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>>>> osxfs           466G  158G  305G  35% /Users
>>>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>>>> *shm             512M   15M  498M   3% /dev/shm*
>>>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>>>> tmpfs           128M     0  128M   0% /tmp
>>>>> 
>>>>> The errors in log went away. Consider that I remove the log file
>>>>> before start corosync so it does not contains lines of previous
>>>>> executions.
>>>>> <corosync.log>
>>>>> 
>>>>> But the command:
>>>>> corosync-quorumtool -ps
>>>>> 
>>>>> still give:
>>>>> Cannot initialize QUORUM service
>>>>> 
>>>>> Consider that few minutes before it gave me the message:
>>>>> Cannot initialize CFG service
>>>>> 
>>>>> I do not know the differences between CFG and QUORUM in this case.
>>>>> 
>>>>> If I try to start pacemaker the service is OK but I see only pacemaker
>>>>> and the Transport does not work if I try to run a cam command.
>>>>> Any suggestion?
>>>>> 
>>>>> 
>>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaulfie at redhat.com
>>>>>> <mailto:ccaulfie at redhat.com>
>>>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>>> 
>>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Yes,
>>>>>>> 
>>>>>>> I am reproducing only the required part for test. I think the original
>>>>>>> system has a larger shm. The problem is that I do not know exactly how
>>>>>>> to change it.
>>>>>>> I tried the following steps, but I have the impression I didn’t
>>>>>>> performed the right one:
>>>>>>> 
>>>>>>> 1. remove everything under /tmp
>>>>>>> 2. Added the following line to /etc/fstab
>>>>>>> tmpfs   /tmp         tmpfs  
>>>>>>> defaults,nodev,nosuid,mode=1777,size=128M 
>>>>>>>         0  0
>>>>>>> 3. mount /tmp
>>>>>>> 4. df -h
>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> overlay          63G   11G   49G  19% /
>>>>>>> tmpfs            64M  4.0K   64M   1% /dev
>>>>>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>>>>>> osxfs           466G  158G  305G  35% /Users
>>>>>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>>>>>> shm              64M   11M   54M  16% /dev/shm
>>>>>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>>>>>> *tmpfs           128M     0  128M   0% /tmp*
>>>>>>> 
>>>>>>> The errors are exactly the same.
>>>>>>> I have the impression that I changed the wrong parameter. Probably I
>>>>>>> have to change:
>>>>>>> shm              64M   11M   54M  16% /dev/shm
>>>>>>> 
>>>>>>> but I do not know how to do that. Any suggestion?
>>>>>>> 
>>>>>> 
>>>>>> According to google, you just add a new line to /etc/fstab for /dev/shm
>>>>>> 
>>>>>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>>>>>> 
>>>>>> Chrissie
>>>>>> 
>>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield
>>>>>>>> <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>>>>>> <mailto:ccaulfie at redhat.com>
>>>>>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>>>>> 
>>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Let me add here one important detail. I use Docker for my test
>>>>>>>>> with 5
>>>>>>>>> containers deployed on my Mac.
>>>>>>>>> Basically the team that worked on this project installed the cluster
>>>>>>>>> on soft layer bare metal.
>>>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>>>>>>> occurred recreate the cluster from scratch is not easy.
>>>>>>>>> Test it was a cumbersome if you consider that we access to the
>>>>>>>>> machines with a complex system hard to describe here.
>>>>>>>>> For this reason I ported the cluster on Docker for test purpose.
>>>>>>>>> I am
>>>>>>>>> not interested to have it working for months, I just need a proof of
>>>>>>>>> concept. 
>>>>>>>>> 
>>>>>>>>> When the migration works I’ll port everything on bare metal
>>>>>>>>> where the
>>>>>>>>> size of resources are ambundant.  
>>>>>>>>> 
>>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me
>>>>>>>>> what
>>>>>>>>> should be an acceptable size for several days of running it is ok
>>>>>>>>> for me.
>>>>>>>>> It is ok also have commands to clean the shm when required.
>>>>>>>>> I know I can find them on Google but if you can suggest me these
>>>>>>>>> info
>>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>>>>>>>> avoid days of guesswork and try and error if possible.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if
>>>>>>>> you can
>>>>>>>> spare it. My 'standard' system uses 75MB under normal running
>>>>>>>> allowing
>>>>>>>> for one command-line query to run.
>>>>>>>> 
>>>>>>>> If I read this right then you're reproducing a bare-metal system in
>>>>>>>> containers now? so the original systems will have a default /dev/shm
>>>>>>>> size which is probably much larger than your containers?
>>>>>>>> 
>>>>>>>> I'm just checking here that we don't have a regression in memory
>>>>>>>> usage
>>>>>>>> as Poki suggested.
>>>>>>>> 
>>>>>>>> Chrissie
>>>>>>>> 
>>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com
>>>>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>>>>> <mailto:jpokorny at redhat.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>>>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>>>>>>> 
>>>>>>>>>>> then I installed new packages. The problem is the same, when
>>>>>>>>>>> I launch:
>>>>>>>>>>> corosync-quorumtool -ps
>>>>>>>>>>> 
>>>>>>>>>>> I got: Cannot initialize QUORUM service
>>>>>>>>>>> 
>>>>>>>>>>> Here the log with debug enabled:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
>>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>>>>>>> unavailable (11)
>>>>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>>>>>>>> Resource temporarily unavailable (11)
>>>>>>>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>>>>>>> 
>>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>>>>>>> commands, however:
>>>>>>>>>>> 
>>>>>>>>>>> df -h /dev/shm
>>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>>>>>>> 
>>>>>>>>>>> ls /dev/shm
>>>>>>>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>>>>>>>    qb-quorum-request-18020-18095-32-data
>>>>>>>>>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header
>>>>>>>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>>>>>>> 
>>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>>>>>>> corosync release?
>>>>>>>>>> 
>>>>>>>>>> For a start, can you try configuring corosync with
>>>>>>>>>> --enable-small-memory-footprint switch?
>>>>>>>>>> 
>>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>>>>>>> opposite of generous (per today's standards), but may be the result
>>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>>>>>>> the above build-time toggle might help.
>>>>>>>>>> 
>>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>>>>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>>>>>>>> and similar measures in corosync).
>>>>>>>>>> 
>>>>>>>>>> Of course, feel free to raise a regression if you have a
>>>>>>>>>> reproducible
>>>>>>>>>> comparison between two corosync (plus possibly different libraries
>>>>>>>>>> like libqb) versions, one that works and one that won't, in
>>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Jan (Poki)
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>> 
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>>> Getting
>>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>> 
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>>>>>> Getting
>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>> <mailto:Users at clusterlabs.org>
>>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>> 
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> <http://www.clusterlabs.org/>
>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>>>>> Getting
>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>> 
>>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>>> Getting
>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/2957b6e8/attachment-0002.html>