[ClusterLabs] Upgrade corosync problem

Tue Jun 26 09:35:45 UTC 2018

Sorry after the command:

corosync-quorumtool -ps

the error in log are still visible. Looking at the source code it seems problem is at this line:
https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c <https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c>

    if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {
		fprintf(stderr, "Cannot initialize QUORUM service\n");
		q_handle = 0;
		goto out;
	}

	if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {
		fprintf(stderr, "Cannot initialise CFG service\n");
		c_handle = 0;
		goto out;
	}

The quorum_initialize function is defined here:
https://github.com/corosync/corosync/blob/master/lib/quorum.c <https://github.com/corosync/corosync/blob/master/lib/quorum.c>

It seems interacts with libqb to allocate space on /dev/shm but something fails. I tried to update the libqb with apt-get install but no success.

The same for second function:
https://github.com/corosync/corosync/blob/master/lib/cfg.c <https://github.com/corosync/corosync/blob/master/lib/cfg.c>

Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5.

The folder /dev/shm has 777 permission like other nodes with older corosync and pacemaker that work fine. The only difference is that I only see files created by root, no one created by hacluster like other two nodes (probably because pacemaker didn’t start correctly).

This is the analysis I have done so far.
Any suggestion?

> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadangelo at gmail.com> wrote:
> 
> Yes, sorry you’re right I could find it by myself.
> However, I did the following:
> 
> 1. Added the line you suggested to /etc/fstab
> 2. mount -o remount /dev/shm
> 3. Now I correctly see /dev/shm of 512M with df -h
> Filesystem      Size  Used Avail Use% Mounted on
> overlay          63G   11G   49G  19% /
> tmpfs            64M  4.0K   64M   1% /dev
> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
> osxfs           466G  158G  305G  35% /Users
> /dev/sda1        63G   11G   49G  19% /etc/hosts
> shm             512M   15M  498M   3% /dev/shm
> tmpfs          1000M     0 1000M   0% /sys/firmware
> tmpfs           128M     0  128M   0% /tmp
> 
> The errors in log went away. Consider that I remove the log file before start corosync so it does not contains lines of previous executions.
> <corosync.log>
> 
> But the command:
> corosync-quorumtool -ps
> 
> still give:
> Cannot initialize QUORUM service
> 
> Consider that few minutes before it gave me the message:
> Cannot initialize CFG service
> 
> I do not know the differences between CFG and QUORUM in this case.
> 
> If I try to start pacemaker the service is OK but I see only pacemaker and the Transport does not work if I try to run a cam command.
> Any suggestion?
> 
> 
>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>> wrote:
>> 
>> On 26/06/18 09:40, Salvatore D'angelo wrote:
>>> Hi,
>>> 
>>> Yes,
>>> 
>>> I am reproducing only the required part for test. I think the original
>>> system has a larger shm. The problem is that I do not know exactly how
>>> to change it.
>>> I tried the following steps, but I have the impression I didn’t
>>> performed the right one:
>>> 
>>> 1. remove everything under /tmp
>>> 2. Added the following line to /etc/fstab
>>> tmpfs   /tmp         tmpfs   defaults,nodev,nosuid,mode=1777,size=128M 
>>>         0  0
>>> 3. mount /tmp
>>> 4. df -h
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> overlay          63G   11G   49G  19% /
>>> tmpfs            64M  4.0K   64M   1% /dev
>>> tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
>>> osxfs           466G  158G  305G  35% /Users
>>> /dev/sda1        63G   11G   49G  19% /etc/hosts
>>> shm              64M   11M   54M  16% /dev/shm
>>> tmpfs          1000M     0 1000M   0% /sys/firmware
>>> *tmpfs           128M     0  128M   0% /tmp*
>>> 
>>> The errors are exactly the same.
>>> I have the impression that I changed the wrong parameter. Probably I
>>> have to change:
>>> shm              64M   11M   54M  16% /dev/shm
>>> 
>>> but I do not know how to do that. Any suggestion?
>>> 
>> 
>> According to google, you just add a new line to /etc/fstab for /dev/shm
>> 
>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>> 
>> Chrissie
>> 
>>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>>>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>>>> 
>>>> On 25/06/18 20:41, Salvatore D'angelo wrote:
>>>>> Hi,
>>>>> 
>>>>> Let me add here one important detail. I use Docker for my test with 5
>>>>> containers deployed on my Mac.
>>>>> Basically the team that worked on this project installed the cluster
>>>>> on soft layer bare metal.
>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration
>>>>> occurred recreate the cluster from scratch is not easy.
>>>>> Test it was a cumbersome if you consider that we access to the
>>>>> machines with a complex system hard to describe here.
>>>>> For this reason I ported the cluster on Docker for test purpose. I am
>>>>> not interested to have it working for months, I just need a proof of
>>>>> concept. 
>>>>> 
>>>>> When the migration works I’ll port everything on bare metal where the
>>>>> size of resources are ambundant.  
>>>>> 
>>>>> Now I have enough RAM and disk space on my Mac so if you tell me what
>>>>> should be an acceptable size for several days of running it is ok for me.
>>>>> It is ok also have commands to clean the shm when required.
>>>>> I know I can find them on Google but if you can suggest me these info
>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to
>>>>> avoid days of guesswork and try and error if possible.
>>>> 
>>>> 
>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can
>>>> spare it. My 'standard' system uses 75MB under normal running allowing
>>>> for one command-line query to run.
>>>> 
>>>> If I read this right then you're reproducing a bare-metal system in
>>>> containers now? so the original systems will have a default /dev/shm
>>>> size which is probably much larger than your containers?
>>>> 
>>>> I'm just checking here that we don't have a regression in memory usage
>>>> as Poki suggested.
>>>> 
>>>> Chrissie
>>>> 
>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpokorny at redhat.com <mailto:jpokorny at redhat.com>
>>>>>> <mailto:jpokorny at redhat.com <mailto:jpokorny at redhat.com>>> wrote:
>>>>>> 
>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
>>>>>>> Thanks for reply. I scratched my cluster and created it again and
>>>>>>> then migrated as before. This time I uninstalled pacemaker,
>>>>>>> corosync, crmsh and resource agents with make uninstall
>>>>>>> 
>>>>>>> then I installed new packages. The problem is the same, when
>>>>>>> I launch:
>>>>>>> corosync-quorumtool -ps
>>>>>>> 
>>>>>>> I got: Cannot initialize QUORUM service
>>>>>>> 
>>>>>>> Here the log with debug enabled:
>>>>>>> 
>>>>>>> 
>>>>>>> [18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data
>>>>>>> [18019] pg3 corosyncerror   [QB    ]
>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
>>>>>>> unavailable (11)
>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header
>>>>>>> [18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header
>>>>>>> [18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
>>>>>>> Resource temporarily unavailable (11)
>>>>>>> [18019] pg3 corosyncerror   [QB    ] Error in connection setup
>>>>>>> (18020-18028-23): Resource temporarily unavailable (11)
>>>>>>> 
>>>>>>> I tried to check /dev/shm and I am not sure these are the right
>>>>>>> commands, however:
>>>>>>> 
>>>>>>> df -h /dev/shm
>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> shm              64M   16M   49M  24% /dev/shm
>>>>>>> 
>>>>>>> ls /dev/shm
>>>>>>> qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
>>>>>>>    qb-quorum-request-18020-18095-32-data
>>>>>>> qb-cmap-request-18020-18036-25-header  qb-corosync-blackbox-header
>>>>>>>  qb-quorum-request-18020-18095-32-header
>>>>>>> 
>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous
>>>>>>> corosync release?
>>>>>> 
>>>>>> For a start, can you try configuring corosync with
>>>>>> --enable-small-memory-footprint switch?
>>>>>> 
>>>>>> Hard to say why the space provisioned to /dev/shm is the direct
>>>>>> opposite of generous (per today's standards), but may be the result
>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case,
>>>>>> the above build-time toggle might help.
>>>>>> 
>>>>>> If not, then exponentially increasing size of /dev/shm space is
>>>>>> likely your best bet (I don't recommended fiddling with mlockall()
>>>>>> and similar measures in corosync).
>>>>>> 
>>>>>> Of course, feel free to raise a regression if you have a reproducible
>>>>>> comparison between two corosync (plus possibly different libraries
>>>>>> like libqb) versions, one that works and one that won't, in
>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.).
>>>>>> 
>>>>>> -- 
>>>>>> Jan (Poki)
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>>>> 
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>> 
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>> 
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180626/20837e2a/attachment-0001.html>