[ClusterLabs] Upgrade corosync problem
Salvatore D'angelo
sasadangelo at gmail.com
Tue Jun 26 06:00:00 EDT 2018
Consider that the container is the same when corosync 2.3.5 run.
If it is something related to the container probably the 2.4.4 introduced a feature that has an impact on container.
Should be something related to libqb according to the code.
Anyone can help?
> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaulfie at redhat.com> wrote:
>
> On 26/06/18 10:35, Salvatore D'angelo wrote:
>> Sorry after the command:
>>
>> corosync-quorumtool -ps
>>
>> the error in log are still visible. Looking at the source code it seems
>> problem is at this line:
>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c
>>
>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {
>> fprintf(stderr, "Cannot initialize QUORUM service\n");
>> q_handle = 0;
>> goto out;
>> }
>>
>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {
>> fprintf(stderr, "Cannot initialise CFG service\n");
>> c_handle = 0;
>> goto out;
>> }
>>
>> The quorum_initialize function is defined here:
>> https://github.com/corosync/corosync/blob/master/lib/quorum.c
>>
>> It seems interacts with libqb to allocate space on /dev/shm but
>> something fails. I tried to update the libqb with apt-get install but no
>> success.
>>
>> The same for second function:
>> https://github.com/corosync/corosync/blob/master/lib/cfg.c
>>
>> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5.
>>
>> The folder /dev/shm has 777 permission like other nodes with older
>> corosync and pacemaker that work fine. The only difference is that I
>> only see files created by root, no one created by hacluster like other
>> two nodes (probably because pacemaker didn’t start correctly).
>>
>> This is the analysis I have done so far.
>> Any suggestion?
>>
>>
>
> Hmm. t seems very likely something to do with the way the container is
> set up then - and I know nothing about containers. Sorry :/
>
> Can anyone else help here?
>
> Chrissie
>
>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadangelo at gmail.com <mailto:sasadangelo at gmail.com>
>>> <mailto:sasadangelo at gmail.com <mailto:sasadangelo at gmail.com>>> wrote:
>>>
>>> Yes, sorry you’re right I could find it by myself.
>>> However, I did the following:
>>>
>>> 1. Added the line you suggested to /etc/fstab
>>> 2. mount -o remount /dev/shm
>>> 3. Now I correctly see /dev/shm of 512M with df -h
>>> Filesystem Size Used Avail Use% Mounted on
>>> overlay 63G 11G 49G 19% /
>>> tmpfs 64M 4.0K 64M 1% /dev
>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup
>>> osxfs 466G 158G 305G 35% /Users
>>> /dev/sda1 63G 11G 49G 19% /etc/hosts
>>> *shm 512M 15M 498M 3% /dev/shm*
>>> tmpfs 1000M 0 1000M 0% /sys/firmware
>>> tmpfs 128M 0 128M 0% /tmp
>>>
>>> The errors in log went away. Consider that I remove the log file
>>> before start corosync so it does not contains lines of previous
>>> executions.
>>> <corosync.log>
>>>
>>> But the command:
>>> corosync-quorumtool -ps
>>>
>>> still give:
>>> Cannot initialize QUORUM service
>>>
>>> Consider that few minutes before it gave me the message:
>>> Cannot initialize CFG service
>>>
>>> I do not know the differences between CFG and QUORUM in this case.
>>>
>>> If I try to start pacemaker the service is OK but I see only pacemaker
>>> and the Transport does not work if I try to run a cam command.
>>> Any suggestion?
>>>
>>>
>>>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>>>>
>>>> On 26/06/18 09:40, Salvatore D'angelo wrote:
>>>>> Hi,
>>>>>
>>>>> Yes,
>>>>>
>>>>> I am reproducing only the required part for test. I think the original
>>>>> system has a larger shm. The problem is that I do not know exactly how
>>>>> to change it.
>>>>> I tried the following steps, but I have the impression I didn’t
>>>>> performed the right one:
>>>>>
>>>>> 1. remove everything under /tmp
>>>>> 2. Added the following line to /etc/fstab
>>>>> tmpfs /tmp tmpfs defaults,nodev,nosuid,mode=1777,size=128M
>>>>> 0 0
>>>>> 3. mount /tmp
>>>>> 4. df -h
>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>> overlay 63G 11G 49G 19% /
>>>>> tmpfs 64M 4.0K 64M 1% /dev
>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup
>>>>> osxfs 466G 158G 305G 35% /Users
>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts
>>>>> shm 64M 11M 54M 16% /dev/shm
>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware
>>>>> *tmpfs 128M 0 128M 0% /tmp*
>>>>>
>>>>> The errors are exactly the same.
>>>>> I have the impression that I changed the wrong parameter. Probably I
>>>>> have to change:
>>>>> shm 64M 11M 54M 16% /dev/shm
>>>>>
>>>>> but I do not know how to do that. Any suggestion?
>>>>>
>>>>
>>>> According to google, you just add a new line to /etc/fstab for /dev/shm
>>>>
>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0
>>>>
>>>> Chrissie
>>>>
>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>>>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>
>>>>>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>>>>>>
>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Let me add here one important detail. I use Docker for my test with 5
>>>>>>> containers deployed on my Mac.
>>>>>>> Basically the team that worked on this project installed the cluster
>>>>>>> on soft layer bare metal.
>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>>>>> occurred recreate the cluster from scratch is not easy.
>>>>>>> Test it was a cumbersome if you consider that we access to the
>>>>>>> machines with a complex system hard to describe here.
>>>>>>> For this reason I ported the cluster on Docker for test purpose. I am
>>>>>>> not interested to have it working for months, I just need a proof of
>>>>>>> concept.
>>>>>>>
>>>>>>> When the migration works I’ll port everything on bare metal where the
>>>>>>> size of resources are ambundant.
>>>>>>>
>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me what
>>>>>>> should be an acceptable size for several days of running it is ok
>>>>>>> for me.
>>>>>>> It is ok also have commands to clean the shm when required.
>>>>>>> I know I can find them on Google but if you can suggest me these info
>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>>>>>> avoid days of guesswork and try and error if possible.
>>>>>>
>>>>>>
>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can
>>>>>> spare it. My 'standard' system uses 75MB under normal running allowing
>>>>>> for one command-line query to run.
>>>>>>
>>>>>> If I read this right then you're reproducing a bare-metal system in
>>>>>> containers now? so the original systems will have a default /dev/shm
>>>>>> size which is probably much larger than your containers?
>>>>>>
>>>>>> I'm just checking here that we don't have a regression in memory usage
>>>>>> as Poki suggested.
>>>>>>
>>>>>> Chrissie
>>>>>>
>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com <mailto:jpokorny at redhat.com>
>>>>>>>> <mailto:jpokorny at redhat.com <mailto:jpokorny at redhat.com>>
>>>>>>>> <mailto:jpokorny at redhat.com <mailto:jpokorny at redhat.com>>> wrote:
>>>>>>>>
>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>>>>>
>>>>>>>>> then I installed new packages. The problem is the same, when
>>>>>>>>> I launch:
>>>>>>>>> corosync-quorumtool -ps
>>>>>>>>>
>>>>>>>>> I got: Cannot initialize QUORUM service
>>>>>>>>>
>>>>>>>>> Here the log with debug enabled:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap
>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>>>>> [18019] pg3 corosyncerror [QB ]
>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>>>>> unavailable (11)
>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer:
>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer:
>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED:
>>>>>>>>> Resource temporarily unavailable (11)
>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup
>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>>>>>
>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>>>>> commands, however:
>>>>>>>>>
>>>>>>>>> df -h /dev/shm
>>>>>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>>>>>> shm 64M 16M 49M 24% /dev/shm
>>>>>>>>>
>>>>>>>>> ls /dev/shm
>>>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data
>>>>>>>>> qb-quorum-request-18020-18095-32-data
>>>>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header
>>>>>>>>> qb-quorum-request-18020-18095-32-header
>>>>>>>>>
>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>>>>> corosync release?
>>>>>>>>
>>>>>>>> For a start, can you try configuring corosync with
>>>>>>>> --enable-small-memory-footprint switch?
>>>>>>>>
>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>>>>> opposite of generous (per today's standards), but may be the result
>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>>>>> the above build-time toggle might help.
>>>>>>>>
>>>>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>>>>>> and similar measures in corosync).
>>>>>>>>
>>>>>>>> Of course, feel free to raise a regression if you have a reproducible
>>>>>>>> comparison between two corosync (plus possibly different libraries
>>>>>>>> like libqb) versions, one that works and one that won't, in
>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jan (Poki)
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>>>>>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>>>>>> Getting
>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>>>>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>>>>> Getting
>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>>>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>>>> Getting
>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>>
>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/5a207d6d/attachment-0002.html>
More information about the Users
mailing list