[ClusterLabs] Upgrade corosync problem

Salvatore D'angelo sasadangelo at gmail.com
Fri Jul 6 08:24:04 EDT 2018


Hi All,

The option --ulimit memlock=536870912 worked fine.

I have now another strange issue. The upgrade without updating libqb (leaving the 0.16.0) worked fine.
If after the upgrade I stop pacemaker and corosync, I download the latest libqb version:
https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
build and install it everything works fine.

If I try to install in sequence (after the installation of old code):

libqb 1.0.3
corosync 2.4.4
pacemaker 1.1.18
crmsh 3.0.1
resource agents 4.1.1

when I try to start corosync I got the following error:
Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line 99:  8470 Aborted                 $prog $COROSYNC_OPTIONS > /dev/null 2>&1
[FAILED]

if I launch corosync -f I got:
corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.

anything is logged (even in debug mode).

I do not understand why installing libqb during the normal upgrade process fails while if I upgrade it after the crmsh/pacemaker/corosync/resourceagents upgrade it works fine. 

On 3 Jul 2018, at 11:42, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 03/07/18 07:53, Jan Pokorný wrote:
>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>> Today I tested the two suggestions you gave me. Here what I did. 
>>> In the script where I create my 5 machines cluster (I use three
>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>> that we use for database backup and WAL files).
>>> 
>>> FIRST TEST
>>> ——————————
>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>> add the entry in /etc/fstab. However, I did it anyway:
>>> 
>>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>>> 
>>> and then
>>> mount -o remount /dev/shm
>>> 
>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>> corosync and pacemaker) and installed the new one.
>>> Started corosync and pacemaker but same problem occurred.
>>> 
>>> SECOND TEST
>>> ———————————
>>> stopped corosync and pacemaker
>>> uninstalled corosync
>>> build corosync with --enable-small-memory-footprint and installed it
>>> starte corosync and pacemaker
>>> 
>>> IT WORKED.
>>> 
>>> I would like to understand now why it didn’t worked in first test
>>> and why it worked in second. Which kind of memory is used too much
>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>> docker images (obviously on my single Mac) and enabled the container
>>> option as you suggested. Am I missing something here?
>> 
>> My suspicion then fully shifts towards "maximum number of bytes of
>> memory that may be locked into RAM" per-process resource limit as
>> raised in one of the most recent message ...
>> 
>>> Now I want to use Docker for the moment only for test purpose so it
>>> could be ok to use the --enable-small-memory-footprint, but there is
>>> something I can do to have corosync working even without this
>>> option?
>> 
>> ... so try running the container the already suggested way:
>> 
>>  docker run ... --ulimit memlock=33554432 ...
>> 
>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>> value until some unreasonable amount is reached, like the equivalent
>> of already used 512 MiB).
>> 
>> Hope this helps.
> 
> This makes a lot of sense to me. As Poki pointed out earlier, in
> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
> NOT to be locked in RAM after it forked - which was causing potential
> performance issues. So if you replace an earlier corosync with 2.4.3 or
> later then it will use more locked memory than before.
> 
> Chrissie
> 
> 
>> 
>>> The reason I am asking this is that, in the future, it could be
>>> possible we deploy in production our cluster in containerised way
>>> (for the moment is just an idea). This will save a lot of time in
>>> developing, maintaining and deploying our patch system. All
>>> prerequisites and dependencies will be enclosed in container and if
>>> IT team will do some maintenance on bare metal (i.e. install new
>>> dependencies) it will not affects our containers. I do not see a lot
>>> of performance drawbacks in using container. The point is to
>>> understand if a containerised approach could save us lot of headache
>>> about maintenance of this cluster without affect performance too
>>> much. I am notice in Cloud environment this approach in a lot of
>>> contexts.
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>> 
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180706/655a2385/attachment-0002.html>


More information about the Users mailing list