[ClusterLabs] Upgrade corosync problem
ccaulfie at redhat.com
Fri Jul 6 08:40:19 EDT 2018
On 06/07/18 13:24, Salvatore D'angelo wrote:
> Hi All,
> The option --ulimit memlock=536870912 worked fine.
> I have now another strange issue. The upgrade without updating libqb
> (leaving the 0.16.0) worked fine.
> If after the upgrade I stop pacemaker and corosync, I download the
> latest libqb version:
> build and install it everything works fine.
> If I try to install in sequence (after the installation of old code):
> libqb 1.0.3
> corosync 2.4.4
> pacemaker 1.1.18
> crmsh 3.0.1
> resource agents 4.1.1
> when I try to start corosync I got the following error:
> *Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line
> 99: 8470 Aborted $prog $COROSYNC_OPTIONS > /dev/null 2>&1*
Yes. you can't randomly swap in and out hand-compiled libqb versions.
Find one that works and stick to it. It's an annoying 'feature' of newer
linkers that we had to workaround in libqb. So if you rebuild libqb
1.0.3 then you will, in all likelihood, need to rebuild corosync to
> if I launch corosync -f I got:
> *corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite
> section is populated, otherwise target's build is at fault, preventing
> reliable logging" && __start___verbose != __stop___verbose' failed.*
> anything is logged (even in debug mode).
> I do not understand why installing libqb during the normal upgrade
> process fails while if I upgrade it after the
> crmsh/pacemaker/corosync/resourceagents upgrade it works fine.
> On 3 Jul 2018, at 11:42, Christine Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>> wrote:
>> On 03/07/18 07:53, Jan Pokorný wrote:
>>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>>> Today I tested the two suggestions you gave me. Here what I did.
>>>> In the script where I create my 5 machines cluster (I use three
>>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>>> that we use for database backup and WAL files).
>>>> FIRST TEST
>>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>>> add the entry in /etc/fstab. However, I did it anyway:
>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0
>>>> and then
>>>> mount -o remount /dev/shm
>>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>>> corosync and pacemaker) and installed the new one.
>>>> Started corosync and pacemaker but same problem occurred.
>>>> SECOND TEST
>>>> stopped corosync and pacemaker
>>>> uninstalled corosync
>>>> build corosync with --enable-small-memory-footprint and installed it
>>>> starte corosync and pacemaker
>>>> IT WORKED.
>>>> I would like to understand now why it didn’t worked in first test
>>>> and why it worked in second. Which kind of memory is used too much
>>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>>> docker images (obviously on my single Mac) and enabled the container
>>>> option as you suggested. Am I missing something here?
>>> My suspicion then fully shifts towards "maximum number of bytes of
>>> memory that may be locked into RAM" per-process resource limit as
>>> raised in one of the most recent message ...
>>>> Now I want to use Docker for the moment only for test purpose so it
>>>> could be ok to use the --enable-small-memory-footprint, but there is
>>>> something I can do to have corosync working even without this
>>> ... so try running the container the already suggested way:
>>> docker run ... --ulimit memlock=33554432 ...
>>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>>> value until some unreasonable amount is reached, like the equivalent
>>> of already used 512 MiB).
>>> Hope this helps.
>> This makes a lot of sense to me. As Poki pointed out earlier, in
>> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
>> NOT to be locked in RAM after it forked - which was causing potential
>> performance issues. So if you replace an earlier corosync with 2.4.3 or
>> later then it will use more locked memory than before.
>>>> The reason I am asking this is that, in the future, it could be
>>>> possible we deploy in production our cluster in containerised way
>>>> (for the moment is just an idea). This will save a lot of time in
>>>> developing, maintaining and deploying our patch system. All
>>>> prerequisites and dependencies will be enclosed in container and if
>>>> IT team will do some maintenance on bare metal (i.e. install new
>>>> dependencies) it will not affects our containers. I do not see a lot
>>>> of performance drawbacks in using container. The point is to
>>>> understand if a containerised approach could save us lot of headache
>>>> about maintenance of this cluster without affect performance too
>>>> much. I am notice in Cloud environment this approach in a lot of
>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users