[ClusterLabs] Upgrade corosync problem

Tue Jun 26 04:49:43 EDT 2018

On 26/06/18 09:40, Salvatore D'angelo wrote:
> Hi,
> 
> Yes,
> 
> I am reproducing only the required part for test. I think the original
> system has a larger shm. The problem is that I do not know exactly how
> to change it.
> I tried the following steps, but I have the impression I didn’t
> performed the right one:
> 
> 1. remove everything under /tmp
> 2. Added the following line to /etc/fstab
> tmpfs   /tmp         tmpfs   defaults,nodev,nosuid,mode=1777,size=128M 
>         0  0
> 3. mount /tmp
> 4. df -h
> Filesystem      Size  Used Avail Use% Mounted on
> overlay          63G   11G   49G  19% /
> tmpfs            64M  4.0K   64M   1% /dev
> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
> osxfs           466G  158G  305G  35% /Users
> /dev/sda1        63G   11G   49G  19% /etc/hosts
> shm              64M   11M   54M  16% /dev/shm
> tmpfs          1000M     0 1000M   0% /sys/firmware
> *tmpfs           128M     0  128M   0% /tmp*
> 
> The errors are exactly the same.
> I have the impression that I changed the wrong parameter. Probably I
> have to change:
> shm              64M   11M   54M  16% /dev/shm
> 
> but I do not know how to do that. Any suggestion?
> 

According to google, you just add a new line to /etc/fstab for /dev/shm

tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0

Chrissie

>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaulfie at redhat.com
>> <mailto:ccaulfie at redhat.com>> wrote:
>>
>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>> Hi,
>>>
>>> Let me add here one important detail. I use Docker for my test with 5
>>> containers deployed on my Mac.
>>> Basically the team that worked on this project installed the cluster
>>> on soft layer bare metal.
>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>> occurred recreate the cluster from scratch is not easy.
>>> Test it was a cumbersome if you consider that we access to the
>>> machines with a complex system hard to describe here.
>>> For this reason I ported the cluster on Docker for test purpose. I am
>>> not interested to have it working for months, I just need a proof of
>>> concept. 
>>>
>>> When the migration works I’ll port everything on bare metal where the
>>> size of resources are ambundant.  
>>>
>>> Now I have enough RAM and disk space on my Mac so if you tell me what
>>> should be an acceptable size for several days of running it is ok for me.
>>> It is ok also have commands to clean the shm when required.
>>> I know I can find them on Google but if you can suggest me these info
>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>> avoid days of guesswork and try and error if possible.
>>
>>
>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can
>> spare it. My 'standard' system uses 75MB under normal running allowing
>> for one command-line query to run.
>>
>> If I read this right then you're reproducing a bare-metal system in
>> containers now? so the original systems will have a default /dev/shm
>> size which is probably much larger than your containers?
>>
>> I'm just checking here that we don't have a regression in memory usage
>> as Poki suggested.
>>
>> Chrissie
>>
>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com
>>>> <mailto:jpokorny at redhat.com>> wrote:
>>>>
>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>
>>>>> then I installed new packages. The problem is the same, when
>>>>> I launch:
>>>>> corosync-quorumtool -ps
>>>>>
>>>>> I got: Cannot initialize QUORUM service
>>>>>
>>>>> Here the log with debug enabled:
>>>>>
>>>>>
>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>> unavailable (11)
>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>> Resource temporarily unavailable (11)
>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>
>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>> commands, however:
>>>>>
>>>>> df -h /dev/shm
>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>
>>>>> ls /dev/shm
>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>    qb-quorum-request-18020-18095-32-data
>>>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header
>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>
>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>> corosync release?
>>>>
>>>> For a start, can you try configuring corosync with
>>>> --enable-small-memory-footprint switch?
>>>>
>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>> opposite of generous (per today's standards), but may be the result
>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>> the above build-time toggle might help.
>>>>
>>>> If not, then exponentially increasing size of /dev/shm space is
>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>> and similar measures in corosync).
>>>>
>>>> Of course, feel free to raise a regression if you have a reproducible
>>>> comparison between two corosync (plus possibly different libraries
>>>> like libqb) versions, one that works and one that won't, in
>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>
>>>> -- 
>>>> Jan (Poki)
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>