[ClusterLabs] Upgrade corosync problem

Fri Jul 6 13:25:30 UTC 2018

Hi,

Thanks to reply. The problem is opposite to what you are saying.

When I build corosync with old libqb and I verified the new updated node worked properly I updated with new libqb hand-compiled and it works fine.
But in a normale upgrade procedure I first build libqb (removing first the old one) and then corosync, when I follow this order it does not work.
This is what make me crazy.
I do not understand this behavior.

> On 6 Jul 2018, at 14:40, Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> On 06/07/18 13:24, Salvatore D'angelo wrote:
>> Hi All,
>> 
>> The option --ulimit memlock=536870912 worked fine.
>> 
>> I have now another strange issue. The upgrade without updating libqb
>> (leaving the 0.16.0) worked fine.
>> If after the upgrade I stop pacemaker and corosync, I download the
>> latest libqb version:
>> https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
>> build and install it everything works fine.
>> 
>> If I try to install in sequence (after the installation of old code):
>> 
>> libqb 1.0.3
>> corosync 2.4.4
>> pacemaker 1.1.18
>> crmsh 3.0.1
>> resource agents 4.1.1
>> 
>> when I try to start corosync I got the following error:
>> *Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line
>> 99:  8470 Aborted                 $prog $COROSYNC_OPTIONS > /dev/null 2>&1*
>> *[FAILED]*
> 
> 
> Yes. you can't randomly swap in and out hand-compiled libqb versions.
> Find one that works and stick to it. It's an annoying 'feature' of newer
> linkers that we had to workaround in libqb. So if you rebuild libqb
> 1.0.3 then you will, in all likelihood, need to rebuild corosync to
> match it.
> 
> Chrissie
> 
> 
>> 
>> if I launch corosync -f I got:
>> *corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite
>> section is populated, otherwise target's build is at fault, preventing
>> reliable logging" && __start___verbose != __stop___verbose' failed.*
>> 
>> anything is logged (even in debug mode).
>> 
>> I do not understand why installing libqb during the normal upgrade
>> process fails while if I upgrade it after the
>> crmsh/pacemaker/corosync/resourceagents upgrade it works fine. 
>> 
>> On 3 Jul 2018, at 11:42, Christine Caulfield <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>> <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>>> 
>>> On 03/07/18 07:53, Jan Pokorný wrote:
>>>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>>>> Today I tested the two suggestions you gave me. Here what I did. 
>>>>> In the script where I create my 5 machines cluster (I use three
>>>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>>>> that we use for database backup and WAL files).
>>>>> 
>>>>> FIRST TEST
>>>>> ——————————
>>>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>>>> add the entry in /etc/fstab. However, I did it anyway:
>>>>> 
>>>>> tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0
>>>>> 
>>>>> and then
>>>>> mount -o remount /dev/shm
>>>>> 
>>>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>>>> corosync and pacemaker) and installed the new one.
>>>>> Started corosync and pacemaker but same problem occurred.
>>>>> 
>>>>> SECOND TEST
>>>>> ———————————
>>>>> stopped corosync and pacemaker
>>>>> uninstalled corosync
>>>>> build corosync with --enable-small-memory-footprint and installed it
>>>>> starte corosync and pacemaker
>>>>> 
>>>>> IT WORKED.
>>>>> 
>>>>> I would like to understand now why it didn’t worked in first test
>>>>> and why it worked in second. Which kind of memory is used too much
>>>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>>>> docker images (obviously on my single Mac) and enabled the container
>>>>> option as you suggested. Am I missing something here?
>>>> 
>>>> My suspicion then fully shifts towards "maximum number of bytes of
>>>> memory that may be locked into RAM" per-process resource limit as
>>>> raised in one of the most recent message ...
>>>> 
>>>>> Now I want to use Docker for the moment only for test purpose so it
>>>>> could be ok to use the --enable-small-memory-footprint, but there is
>>>>> something I can do to have corosync working even without this
>>>>> option?
>>>> 
>>>> ... so try running the container the already suggested way:
>>>> 
>>>>  docker run ... --ulimit memlock=33554432 ...
>>>> 
>>>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>>>> value until some unreasonable amount is reached, like the equivalent
>>>> of already used 512 MiB).
>>>> 
>>>> Hope this helps.
>>> 
>>> This makes a lot of sense to me. As Poki pointed out earlier, in
>>> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
>>> NOT to be locked in RAM after it forked - which was causing potential
>>> performance issues. So if you replace an earlier corosync with 2.4.3 or
>>> later then it will use more locked memory than before.
>>> 
>>> Chrissie
>>> 
>>> 
>>>> 
>>>>> The reason I am asking this is that, in the future, it could be
>>>>> possible we deploy in production our cluster in containerised way
>>>>> (for the moment is just an idea). This will save a lot of time in
>>>>> developing, maintaining and deploying our patch system. All
>>>>> prerequisites and dependencies will be enclosed in container and if
>>>>> IT team will do some maintenance on bare metal (i.e. install new
>>>>> dependencies) it will not affects our containers. I do not see a lot
>>>>> of performance drawbacks in using container. The point is to
>>>>> understand if a containerised approach could save us lot of headache
>>>>> about maintenance of this cluster without affect performance too
>>>>> much. I am notice in Cloud environment this approach in a lot of
>>>>> contexts.
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>>> 
>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
>>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>> 
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>>
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180706/650f6e2c/attachment-0001.html>