[ClusterLabs] Upgrade corosync problem

Tue Jun 26 09:03:31 UTC 2018

Yes, sorry you’re right I could find it by myself.
However, I did the following:

1. Added the line you suggested to /etc/fstab
2. mount -o remount /dev/shm
3. Now I correctly see /dev/shm of 512M with df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          63G   11G   49G  19% /
tmpfs            64M  4.0K   64M   1% /dev
tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
osxfs           466G  158G  305G  35% /Users
/dev/sda1        63G   11G   49G  19% /etc/hosts
shm             512M   15M  498M   3% /dev/shm
tmpfs          1000M     0 1000M   0% /sys/firmware
tmpfs           128M     0  128M   0% /tmp

The errors in log went away. Consider that I remove the log file before start corosync so it does not contains lines of previous executions.

But the command:
corosync-quorumtool -ps

still give:
Cannot initialize QUORUM service

Consider that few minutes before it gave me the message:
Cannot initialize CFG service

I do not know the differences between CFG and QUORUM in this case.

If I try to start pacemaker the service is OK but I see only pacemaker and the Transport does not work if I try to run a cam command.
Any suggestion?

> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 26/06/18 09:40, Salvatore D'angelo wrote:
>> Hi,
>> 
>> Yes,
>> 
>> I am reproducing only the required part for test. I think the original
>> system has a larger shm. The problem is that I do not know exactly how
>> to change it.
>> I tried the following steps, but I have the impression I didn’t
>> performed the right one:
>> 
>> 1. remove everything under /tmp
>> 2. Added the following line to /etc/fstab
>> tmpfs   /tmp         tmpfs   defaults,nodev,nosuid,mode=1777,size=128M 
>>         0  0
>> 3. mount /tmp
>> 4. df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> overlay          63G   11G   49G  19% /
>> tmpfs            64M  4.0K   64M   1% /dev
>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>> osxfs           466G  158G  305G  35% /Users
>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>> shm              64M   11M   54M  16% /dev/shm
>> tmpfs          1000M     0 1000M   0% /sys/firmware
>> *tmpfs           128M     0  128M   0% /tmp*
>> 
>> The errors are exactly the same.
>> I have the impression that I changed the wrong parameter. Probably I
>> have to change:
>> shm              64M   11M   54M  16% /dev/shm
>> 
>> but I do not know how to do that. Any suggestion?
>> 
> 
> According to google, you just add a new line to /etc/fstab for /dev/shm
> 
> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
> 
> Chrissie
> 
>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>>> 
>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>> Hi,
>>>> 
>>>> Let me add here one important detail. I use Docker for my test with 5
>>>> containers deployed on my Mac.
>>>> Basically the team that worked on this project installed the cluster
>>>> on soft layer bare metal.
>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>> occurred recreate the cluster from scratch is not easy.
>>>> Test it was a cumbersome if you consider that we access to the
>>>> machines with a complex system hard to describe here.
>>>> For this reason I ported the cluster on Docker for test purpose. I am
>>>> not interested to have it working for months, I just need a proof of
>>>> concept. 
>>>> 
>>>> When the migration works I’ll port everything on bare metal where the
>>>> size of resources are ambundant.  
>>>> 
>>>> Now I have enough RAM and disk space on my Mac so if you tell me what
>>>> should be an acceptable size for several days of running it is ok for me.
>>>> It is ok also have commands to clean the shm when required.
>>>> I know I can find them on Google but if you can suggest me these info
>>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>>> avoid days of guesswork and try and error if possible.
>>> 
>>> 
>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can
>>> spare it. My 'standard' system uses 75MB under normal running allowing
>>> for one command-line query to run.
>>> 
>>> If I read this right then you're reproducing a bare-metal system in
>>> containers now? so the original systems will have a default /dev/shm
>>> size which is probably much larger than your containers?
>>> 
>>> I'm just checking here that we don't have a regression in memory usage
>>> as Poki suggested.
>>> 
>>> Chrissie
>>> 
>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com <mailto:jpokorny at redhat.com>
>>>>> <mailto:jpokorny at redhat.com <mailto:jpokorny at redhat.com>>> wrote:
>>>>> 
>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>> 
>>>>>> then I installed new packages. The problem is the same, when
>>>>>> I launch:
>>>>>> corosync-quorumtool -ps
>>>>>> 
>>>>>> I got: Cannot initialize QUORUM service
>>>>>> 
>>>>>> Here the log with debug enabled:
>>>>>> 
>>>>>> 
>>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>> unavailable (11)
>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>>> Resource temporarily unavailable (11)
>>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>> 
>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>> commands, however:
>>>>>> 
>>>>>> df -h /dev/shm
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>> 
>>>>>> ls /dev/shm
>>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>>    qb-quorum-request-18020-18095-32-data
>>>>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header
>>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>> 
>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>> corosync release?
>>>>> 
>>>>> For a start, can you try configuring corosync with
>>>>> --enable-small-memory-footprint switch?
>>>>> 
>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>> opposite of generous (per today's standards), but may be the result
>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>> the above build-time toggle might help.
>>>>> 
>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>>> and similar measures in corosync).
>>>>> 
>>>>> Of course, feel free to raise a regression if you have a reproducible
>>>>> comparison between two corosync (plus possibly different libraries
>>>>> like libqb) versions, one that works and one that won't, in
>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>> 
>>>>> -- 
>>>>> Jan (Poki)
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>> 
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>> 
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/eeaa0a66/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log
Type: application/octet-stream
Size: 42174 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/eeaa0a66/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/eeaa0a66/attachment-0003.html>