[ClusterLabs] Upgrade corosync problem

Christine Caulfield ccaulfie at redhat.com
Tue Jun 26 03:48:31 EDT 2018


On 25/06/18 20:41, Salvatore D'angelo wrote:
> Hi,
> 
> Let me add here one important detail. I use Docker for my test with 5 containers deployed on my Mac.
> Basically the team that worked on this project installed the cluster on soft layer bare metal.
> The PostgreSQL cluster was hard to test and if a misconfiguration occurred recreate the cluster from scratch is not easy.
> Test it was a cumbersome if you consider that we access to the machines with a complex system hard to describe here.
> For this reason I ported the cluster on Docker for test purpose. I am not interested to have it working for months, I just need a proof of concept. 
> 
> When the migration works I’ll port everything on bare metal where the size of resources are ambundant.  
> 
> Now I have enough RAM and disk space on my Mac so if you tell me what should be an acceptable size for several days of running it is ok for me.
> It is ok also have commands to clean the shm when required.
> I know I can find them on Google but if you can suggest me these info I’ll appreciate. I have OS knowledge to do that but I would like to avoid days of guesswork and try and error if possible.


I would recommend at least 128MB of space on /dev/shm, 256MB if you can
spare it. My 'standard' system uses 75MB under normal running allowing
for one command-line query to run.

If I read this right then you're reproducing a bare-metal system in
containers now? so the original systems will have a default /dev/shm
size which is probably much larger than your containers?

I'm just checking here that we don't have a regression in memory usage
as Poki suggested.

Chrissie

>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com> wrote:
>>
>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>> Thanks for reply. I scratched my cluster and created it again and
>>> then migrated as before. This time I uninstalled pacemaker,
>>> corosync, crmsh and resource agents with make uninstall
>>>
>>> then I installed new packages. The problem is the same, when
>>> I launch:
>>> corosync-quorumtool -ps
>>>
>>> I got: Cannot initialize QUORUM service
>>>
>>> Here the log with debug enabled:
>>>
>>>
>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap on /dev/shm/qb-cfg-event-18020-18028-23-data
>>> [18019] pg3 corosyncerror   [QB    ] qb_rb_open:cfg-event-18020-18028-23: Resource temporarily unavailable (11)
>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cfg-request-18020-18028-23-header
>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cfg-response-18020-18028-23-header
>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED: Resource temporarily unavailable (11)
>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup (18020-18028-23): Resource temporarily unavailable (11)
>>>
>>> I tried to check /dev/shm and I am not sure these are the right
>>> commands, however:
>>>
>>> df -h /dev/shm
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> shm              64M   16M   49M  24% /dev/shm
>>>
>>> ls /dev/shm
>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data    qb-quorum-request-18020-18095-32-data
>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header  qb-quorum-request-18020-18095-32-header
>>>
>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>> corosync release?
>>
>> For a start, can you try configuring corosync with
>> --enable-small-memory-footprint switch?
>>
>> Hard to say why the space provisioned to /dev/shm is the direct
>> opposite of generous (per today's standards), but may be the result
>> of automatic HW adaptation, and if RAM is so scarce in your case,
>> the above build-time toggle might help.
>>
>> If not, then exponentially increasing size of /dev/shm space is
>> likely your best bet (I don't recommended fiddling with mlockall()
>> and similar measures in corosync).
>>
>> Of course, feel free to raise a regression if you have a reproducible
>> comparison between two corosync (plus possibly different libraries
>> like libqb) versions, one that works and one that won't, in
>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>
>> -- 
>> Jan (Poki)
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Users mailing list