[ClusterLabs] Upgrade corosync problem
sasadangelo at gmail.com
Fri Jul 6 08:24:04 EDT 2018
The option --ulimit memlock=536870912 worked fine.
I have now another strange issue. The upgrade without updating libqb (leaving the 0.16.0) worked fine.
If after the upgrade I stop pacemaker and corosync, I download the latest libqb version:
build and install it everything works fine.
If I try to install in sequence (after the installation of old code):
resource agents 4.1.1
when I try to start corosync I got the following error:
Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line 99: 8470 Aborted $prog $COROSYNC_OPTIONS > /dev/null 2>&1
if I launch corosync -f I got:
corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
anything is logged (even in debug mode).
I do not understand why installing libqb during the normal upgrade process fails while if I upgrade it after the crmsh/pacemaker/corosync/resourceagents upgrade it works fine.
On 3 Jul 2018, at 11:42, Christine Caulfield <ccaulfie at redhat.com> wrote:
> On 03/07/18 07:53, Jan Pokorný wrote:
>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>> Today I tested the two suggestions you gave me. Here what I did.
>>> In the script where I create my 5 machines cluster (I use three
>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>> that we use for database backup and WAL files).
>>> FIRST TEST
>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>> add the entry in /etc/fstab. However, I did it anyway:
>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0
>>> and then
>>> mount -o remount /dev/shm
>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>> corosync and pacemaker) and installed the new one.
>>> Started corosync and pacemaker but same problem occurred.
>>> SECOND TEST
>>> stopped corosync and pacemaker
>>> uninstalled corosync
>>> build corosync with --enable-small-memory-footprint and installed it
>>> starte corosync and pacemaker
>>> IT WORKED.
>>> I would like to understand now why it didn’t worked in first test
>>> and why it worked in second. Which kind of memory is used too much
>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>> docker images (obviously on my single Mac) and enabled the container
>>> option as you suggested. Am I missing something here?
>> My suspicion then fully shifts towards "maximum number of bytes of
>> memory that may be locked into RAM" per-process resource limit as
>> raised in one of the most recent message ...
>>> Now I want to use Docker for the moment only for test purpose so it
>>> could be ok to use the --enable-small-memory-footprint, but there is
>>> something I can do to have corosync working even without this
>> ... so try running the container the already suggested way:
>> docker run ... --ulimit memlock=33554432 ...
>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>> value until some unreasonable amount is reached, like the equivalent
>> of already used 512 MiB).
>> Hope this helps.
> This makes a lot of sense to me. As Poki pointed out earlier, in
> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
> NOT to be locked in RAM after it forked - which was causing potential
> performance issues. So if you replace an earlier corosync with 2.4.3 or
> later then it will use more locked memory than before.
>>> The reason I am asking this is that, in the future, it could be
>>> possible we deploy in production our cluster in containerised way
>>> (for the moment is just an idea). This will save a lot of time in
>>> developing, maintaining and deploying our patch system. All
>>> prerequisites and dependencies will be enclosed in container and if
>>> IT team will do some maintenance on bare metal (i.e. install new
>>> dependencies) it will not affects our containers. I do not see a lot
>>> of performance drawbacks in using container. The point is to
>>> understand if a containerised approach could save us lot of headache
>>> about maintenance of this cluster without affect performance too
>>> much. I am notice in Cloud environment this approach in a lot of
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users <https://lists.clusterlabs.org/mailman/listinfo/users>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users