[ClusterLabs] Upgrade corosync problem
Salvatore D'angelo
sasadangelo at gmail.com
Fri Jul 6 12:24:04 UTC 2018
Hi All,
The option --ulimit memlock=536870912 worked fine.
I have now another strange issue. The upgrade without updating libqb (leaving the 0.16.0) worked fine.
If after the upgrade I stop pacemaker and corosync, I download the latest libqb version:
https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
build and install it everything works fine.
If I try to install in sequence (after the installation of old code):
libqb 1.0.3
corosync 2.4.4
pacemaker 1.1.18
crmsh 3.0.1
resource agents 4.1.1
when I try to start corosync I got the following error:
Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line 99: 8470 Aborted $prog $COROSYNC_OPTIONS > /dev/null 2>&1
[FAILED]
if I launch corosync -f I got:
corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
anything is logged (even in debug mode).
I do not understand why installing libqb during the normal upgrade process fails while if I upgrade it after the crmsh/pacemaker/corosync/resourceagents upgrade it works fine.
On 3 Jul 2018, at 11:42, Christine Caulfield <ccaulfie at redhat.com> wrote:
>
> On 03/07/18 07:53, Jan Pokorný wrote:
>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>> Today I tested the two suggestions you gave me. Here what I did.
>>> In the script where I create my 5 machines cluster (I use three
>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>> that we use for database backup and WAL files).
>>>
>>> FIRST TEST
>>> ——————————
>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>> add the entry in /etc/fstab. However, I did it anyway:
>>>
>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0
>>>
>>> and then
>>> mount -o remount /dev/shm
>>>
>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>> corosync and pacemaker) and installed the new one.
>>> Started corosync and pacemaker but same problem occurred.
>>>
>>> SECOND TEST
>>> ———————————
>>> stopped corosync and pacemaker
>>> uninstalled corosync
>>> build corosync with --enable-small-memory-footprint and installed it
>>> starte corosync and pacemaker
>>>
>>> IT WORKED.
>>>
>>> I would like to understand now why it didn’t worked in first test
>>> and why it worked in second. Which kind of memory is used too much
>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>> docker images (obviously on my single Mac) and enabled the container
>>> option as you suggested. Am I missing something here?
>>
>> My suspicion then fully shifts towards "maximum number of bytes of
>> memory that may be locked into RAM" per-process resource limit as
>> raised in one of the most recent message ...
>>
>>> Now I want to use Docker for the moment only for test purpose so it
>>> could be ok to use the --enable-small-memory-footprint, but there is
>>> something I can do to have corosync working even without this
>>> option?
>>
>> ... so try running the container the already suggested way:
>>
>> docker run ... --ulimit memlock=33554432 ...
>>
>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>> value until some unreasonable amount is reached, like the equivalent
>> of already used 512 MiB).
>>
>> Hope this helps.
>
> This makes a lot of sense to me. As Poki pointed out earlier, in
> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
> NOT to be locked in RAM after it forked - which was causing potential
> performance issues. So if you replace an earlier corosync with 2.4.3 or
> later then it will use more locked memory than before.
>
> Chrissie
>
>
>>
>>> The reason I am asking this is that, in the future, it could be
>>> possible we deploy in production our cluster in containerised way
>>> (for the moment is just an idea). This will save a lot of time in
>>> developing, maintaining and deploying our patch system. All
>>> prerequisites and dependencies will be enclosed in container and if
>>> IT team will do some maintenance on bare metal (i.e. install new
>>> dependencies) it will not affects our containers. I do not see a lot
>>> of performance drawbacks in using container. The point is to
>>> understand if a containerised approach could save us lot of headache
>>> about maintenance of this cluster without affect performance too
>>> much. I am notice in Cloud environment this approach in a lot of
>>> contexts.
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180706/655a2385/attachment-0001.html>
More information about the Users
mailing list