[ClusterLabs] Upgrade corosync problem

Tue Jun 26 05:56:58 EDT 2018

On 26/06/18 10:35, Salvatore D'angelo wrote:
> Sorry after the command:
> 
> corosync-quorumtool -ps
> 
> the error in log are still visible. Looking at the source code it seems
> problem is at this line:
> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c
> 
>     if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {
> fprintf(stderr, "Cannot initialize QUORUM service\n");
> q_handle = 0;
> goto out;
> }
> 
> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {
> fprintf(stderr, "Cannot initialise CFG service\n");
> c_handle = 0;
> goto out;
> }
> 
> The quorum_initialize function is defined here:
> https://github.com/corosync/corosync/blob/master/lib/quorum.c
> 
> It seems interacts with libqb to allocate space on /dev/shm but
> something fails. I tried to update the libqb with apt-get install but no
> success.
> 
> The same for second function:
> https://github.com/corosync/corosync/blob/master/lib/cfg.c
> 
> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5.
> 
> The folder /dev/shm has 777 permission like other nodes with older
> corosync and pacemaker that work fine. The only difference is that I
> only see files created by root, no one created by hacluster like other
> two nodes (probably because pacemaker didn’t start correctly).
> 
> This is the analysis I have done so far.
> Any suggestion?
> 
> 

Hmm. t seems very likely something to do with the way the container is
set up then - and I know nothing about containers. Sorry :/

Can anyone else help here?

Chrissie

>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadangelo at gmail.com
>> <mailto:sasadangelo at gmail.com>> wrote:
>>
>> Yes, sorry you’re right I could find it by myself.
>> However, I did the following:
>>
>> 1. Added the line you suggested to /etc/fstab
>> 2. mount -o remount /dev/shm
>> 3. Now I correctly see /dev/shm of 512M with df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> overlay          63G   11G   49G  19% /
>> tmpfs            64M  4.0K   64M   1% /dev
>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>> osxfs           466G  158G  305G  35% /Users
>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>> *shm             512M   15M  498M   3% /dev/shm*
>> tmpfs          1000M     0 1000M   0% /sys/firmware
>> tmpfs           128M     0  128M   0% /tmp
>>
>> The errors in log went away. Consider that I remove the log file
>> before start corosync so it does not contains lines of previous
>> executions.
>> <corosync.log>
>>
>> But the command:
>> corosync-quorumtool -ps
>>
>> still give:
>> Cannot initialize QUORUM service
>>
>> Consider that few minutes before it gave me the message:
>> Cannot initialize CFG service
>>
>> I do not know the differences between CFG and QUORUM in this case.
>>
>> If I try to start pacemaker the service is OK but I see only pacemaker
>> and the Transport does not work if I try to run a cam command.
>> Any suggestion?
>>
>>
>>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaulfie at redhat.com
>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>
>>> On 26/06/18 09:40, Salvatore D'angelo wrote:
>>>> Hi,
>>>>
>>>> Yes,
>>>>
>>>> I am reproducing only the required part for test. I think the original
>>>> system has a larger shm. The problem is that I do not know exactly how
>>>> to change it.
>>>> I tried the following steps, but I have the impression I didn’t
>>>> performed the right one:
>>>>
>>>> 1. remove everything under /tmp
>>>> 2. Added the following line to /etc/fstab
>>>> tmpfs   /tmp         tmpfs   defaults,nodev,nosuid,mode=1777,size=128M 
>>>>         0  0
>>>> 3. mount /tmp
>>>> 4. df -h
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> overlay          63G   11G   49G  19% /
>>>> tmpfs            64M  4.0K   64M   1% /dev
>>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>>> osxfs           466G  158G  305G  35% /Users
>>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>>> shm              64M   11M   54M  16% /dev/shm
>>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>>> *tmpfs           128M     0  128M   0% /tmp*
>>>>
>>>> The errors are exactly the same.
>>>> I have the impression that I changed the wrong parameter. Probably I
>>>> have to change:
>>>> shm              64M   11M   54M  16% /dev/shm
>>>>
>>>> but I do not know how to do that. Any suggestion?
>>>>
>>>
>>> According to google, you just add a new line to /etc/fstab for /dev/shm
>>>
>>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>>>
>>> Chrissie
>>>
>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaulfie at redhat.com
>>>>> <mailto:ccaulfie at redhat.com>
>>>>> <mailto:ccaulfie at redhat.com>> wrote:
>>>>>
>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Let me add here one important detail. I use Docker for my test with 5
>>>>>> containers deployed on my Mac.
>>>>>> Basically the team that worked on this project installed the cluster
>>>>>> on soft layer bare metal.
>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>>>> occurred recreate the cluster from scratch is not easy.
>>>>>> Test it was a cumbersome if you consider that we access to the
>>>>>> machines with a complex system hard to describe here.
>>>>>> For this reason I ported the cluster on Docker for test purpose. I am
>>>>>> not interested to have it working for months, I just need a proof of
>>>>>> concept. 
>>>>>>
>>>>>> When the migration works I’ll port everything on bare metal where the
>>>>>> size of resources are ambundant.  
>>>>>>
>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me what
>>>>>> should be an acceptable size for several days of running it is ok
>>>>>> for me.
>>>>>> It is ok also have commands to clean the shm when required.
>>>>>> I know I can find them on Google but if you can suggest me these info
>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>>>>> avoid days of guesswork and try and error if possible.
>>>>>
>>>>>
>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can
>>>>> spare it. My 'standard' system uses 75MB under normal running allowing
>>>>> for one command-line query to run.
>>>>>
>>>>> If I read this right then you're reproducing a bare-metal system in
>>>>> containers now? so the original systems will have a default /dev/shm
>>>>> size which is probably much larger than your containers?
>>>>>
>>>>> I'm just checking here that we don't have a regression in memory usage
>>>>> as Poki suggested.
>>>>>
>>>>> Chrissie
>>>>>
>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com
>>>>>>> <mailto:jpokorny at redhat.com>
>>>>>>> <mailto:jpokorny at redhat.com>> wrote:
>>>>>>>
>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>>>>
>>>>>>>> then I installed new packages. The problem is the same, when
>>>>>>>> I launch:
>>>>>>>> corosync-quorumtool -ps
>>>>>>>>
>>>>>>>> I got: Cannot initialize QUORUM service
>>>>>>>>
>>>>>>>> Here the log with debug enabled:
>>>>>>>>
>>>>>>>>
>>>>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>>>> unavailable (11)
>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>>>>> Resource temporarily unavailable (11)
>>>>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>>>>
>>>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>>>> commands, however:
>>>>>>>>
>>>>>>>> df -h /dev/shm
>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>>>>
>>>>>>>> ls /dev/shm
>>>>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>>>>    qb-quorum-request-18020-18095-32-data
>>>>>>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header
>>>>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>>>>
>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>>>> corosync release?
>>>>>>>
>>>>>>> For a start, can you try configuring corosync with
>>>>>>> --enable-small-memory-footprint switch?
>>>>>>>
>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>>>> opposite of generous (per today's standards), but may be the result
>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>>>> the above build-time toggle might help.
>>>>>>>
>>>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>>>>> and similar measures in corosync).
>>>>>>>
>>>>>>> Of course, feel free to raise a regression if you have a reproducible
>>>>>>> comparison between two corosync (plus possibly different libraries
>>>>>>> like libqb) versions, one that works and one that won't, in
>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>>>>
>>>>>>> -- 
>>>>>>> Jan (Poki)
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> <http://www.clusterlabs.org/>
>>>>>>> Getting
>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>>> Getting
>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org>
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/>
>>>>> Getting
>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>