<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Hi,</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">I have tried with:</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">0.16.0.real-1ubuntu4</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">0.16.0.real-1ubuntu5</span></div><div><br class=""></div><div>which version should I try?</div><div><br class=""><blockquote type="cite" class=""><div class="">On 26 Jun 2018, at 12:03, Christine Caulfield <<a href="mailto:ccaulfie@redhat.com" class="">ccaulfie@redhat.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On 26/06/18 11:00, Salvatore D'angelo wrote:<br class=""><blockquote type="cite" class="">Consider that the container is the same when corosync 2.3.5 run.<br class="">If it is something related to the container probably the 2.4.4<br class="">introduced a feature that has an impact on container.<br class="">Should be something related to libqb according to the code.<br class="">Anyone can help?<br class=""><br class=""></blockquote><br class=""><br class="">Have you tried downgrading libqb to the previous version to see if it<br class="">still happens?<br class=""><br class="">Chrissie<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On 26 Jun 2018, at 11:56, Christine Caulfield <<a href="mailto:ccaulfie@redhat.com" class="">ccaulfie@redhat.com</a><br class=""><<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>>> wrote:<br class=""><br class="">On 26/06/18 10:35, Salvatore D'angelo wrote:<br class=""><blockquote type="cite" class="">Sorry after the command:<br class=""><br class="">corosync-quorumtool -ps<br class=""><br class="">the error in log are still visible. Looking at the source code it seems<br class="">problem is at this line:<br class=""><a href="https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c" class="">https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c</a><br class=""><br class=""> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) {<br class="">fprintf(stderr, "Cannot initialize QUORUM service\n");<br class="">q_handle = 0;<br class="">goto out;<br class="">}<br class=""><br class="">if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) {<br class="">fprintf(stderr, "Cannot initialise CFG service\n");<br class="">c_handle = 0;<br class="">goto out;<br class="">}<br class=""><br class="">The quorum_initialize function is defined here:<br class="">https://github.com/corosync/corosync/blob/master/lib/quorum.c<br class=""><br class="">It seems interacts with libqb to allocate space on /dev/shm but<br class="">something fails. I tried to update the libqb with apt-get install but no<br class="">success.<br class=""><br class="">The same for second function:<br class="">https://github.com/corosync/corosync/blob/master/lib/cfg.c<br class=""><br class="">Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5.<br class=""><br class="">The folder /dev/shm has 777 permission like other nodes with older<br class="">corosync and pacemaker that work fine. The only difference is that I<br class="">only see files created by root, no one created by hacluster like other<br class="">two nodes (probably because pacemaker didn’t start correctly).<br class=""><br class="">This is the analysis I have done so far.<br class="">Any suggestion?<br class=""><br class=""><br class=""></blockquote><br class="">Hmm. t seems very likely something to do with the way the container is<br class="">set up then - and I know nothing about containers. Sorry :/<br class=""><br class="">Can anyone else help here?<br class=""><br class="">Chrissie<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On 26 Jun 2018, at 11:03, Salvatore D'angelo <<a href="mailto:sasadangelo@gmail.com" class="">sasadangelo@gmail.com</a><br class=""><<a href="mailto:sasadangelo@gmail.com" class="">mailto:sasadangelo@gmail.com</a>><br class=""><<a href="mailto:sasadangelo@gmail.com" class="">mailto:sasadangelo@gmail.com</a>>> wrote:<br class=""><br class="">Yes, sorry you’re right I could find it by myself.<br class="">However, I did the following:<br class=""><br class="">1. Added the line you suggested to /etc/fstab<br class="">2. mount -o remount /dev/shm<br class="">3. Now I correctly see /dev/shm of 512M with df -h<br class="">Filesystem Size Used Avail Use% Mounted on<br class="">overlay 63G 11G 49G 19% /<br class="">tmpfs 64M 4.0K 64M 1% /dev<br class="">tmpfs 1000M 0 1000M 0% /sys/fs/cgroup<br class="">osxfs 466G 158G 305G 35% /Users<br class="">/dev/sda1 63G 11G 49G 19% /etc/hosts<br class="">*shm 512M 15M 498M 3% /dev/shm*<br class="">tmpfs 1000M 0 1000M 0% /sys/firmware<br class="">tmpfs 128M 0 128M 0% /tmp<br class=""><br class="">The errors in log went away. Consider that I remove the log file<br class="">before start corosync so it does not contains lines of previous<br class="">executions.<br class=""><corosync.log><br class=""><br class="">But the command:<br class="">corosync-quorumtool -ps<br class=""><br class="">still give:<br class="">Cannot initialize QUORUM service<br class=""><br class="">Consider that few minutes before it gave me the message:<br class="">Cannot initialize CFG service<br class=""><br class="">I do not know the differences between CFG and QUORUM in this case.<br class=""><br class="">If I try to start pacemaker the service is OK but I see only pacemaker<br class="">and the Transport does not work if I try to run a cam command.<br class="">Any suggestion?<br class=""><br class=""><br class=""><blockquote type="cite" class="">On 26 Jun 2018, at 10:49, Christine Caulfield <<a href="mailto:ccaulfie@redhat.com" class="">ccaulfie@redhat.com</a><br class=""><<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>><br class=""><<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>>> wrote:<br class=""><br class="">On 26/06/18 09:40, Salvatore D'angelo wrote:<br class=""><blockquote type="cite" class="">Hi,<br class=""><br class="">Yes,<br class=""><br class="">I am reproducing only the required part for test. I think the original<br class="">system has a larger shm. The problem is that I do not know exactly how<br class="">to change it.<br class="">I tried the following steps, but I have the impression I didn’t<br class="">performed the right one:<br class=""><br class="">1. remove everything under /tmp<br class="">2. Added the following line to /etc/fstab<br class="">tmpfs /tmp tmpfs <br class="">defaults,nodev,nosuid,mode=1777,size=128M <br class=""> 0 0<br class="">3. mount /tmp<br class="">4. df -h<br class="">Filesystem Size Used Avail Use% Mounted on<br class="">overlay 63G 11G 49G 19% /<br class="">tmpfs 64M 4.0K 64M 1% /dev<br class="">tmpfs 1000M 0 1000M 0% /sys/fs/cgroup<br class="">osxfs 466G 158G 305G 35% /Users<br class="">/dev/sda1 63G 11G 49G 19% /etc/hosts<br class="">shm 64M 11M 54M 16% /dev/shm<br class="">tmpfs 1000M 0 1000M 0% /sys/firmware<br class="">*tmpfs 128M 0 128M 0% /tmp*<br class=""><br class="">The errors are exactly the same.<br class="">I have the impression that I changed the wrong parameter. Probably I<br class="">have to change:<br class="">shm 64M 11M 54M 16% /dev/shm<br class=""><br class="">but I do not know how to do that. Any suggestion?<br class=""><br class=""></blockquote><br class="">According to google, you just add a new line to /etc/fstab for /dev/shm<br class=""><br class="">tmpfs /dev/shm tmpfs defaults,size=512m 0 0<br class=""><br class="">Chrissie<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On 26 Jun 2018, at 09:48, Christine Caulfield<br class=""><<a href="mailto:ccaulfie@redhat.com" class="">ccaulfie@redhat.com</a> <<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>><br class=""><<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>><br class=""><<a href="mailto:ccaulfie@redhat.com" class="">mailto:ccaulfie@redhat.com</a>>> wrote:<br class=""><br class="">On 25/06/18 20:41, Salvatore D'angelo wrote:<br class=""><blockquote type="cite" class="">Hi,<br class=""><br class="">Let me add here one important detail. I use Docker for my test<br class="">with 5<br class="">containers deployed on my Mac.<br class="">Basically the team that worked on this project installed the cluster<br class="">on soft layer bare metal.<br class="">The PostgreSQL cluster was hard to test and if a misconfiguration<br class="">occurred recreate the cluster from scratch is not easy.<br class="">Test it was a cumbersome if you consider that we access to the<br class="">machines with a complex system hard to describe here.<br class="">For this reason I ported the cluster on Docker for test purpose.<br class="">I am<br class="">not interested to have it working for months, I just need a proof of<br class="">concept. <br class=""><br class="">When the migration works I’ll port everything on bare metal<br class="">where the<br class="">size of resources are ambundant. <br class=""><br class="">Now I have enough RAM and disk space on my Mac so if you tell me<br class="">what<br class="">should be an acceptable size for several days of running it is ok<br class="">for me.<br class="">It is ok also have commands to clean the shm when required.<br class="">I know I can find them on Google but if you can suggest me these<br class="">info<br class="">I’ll appreciate. I have OS knowledge to do that but I would like to<br class="">avoid days of guesswork and try and error if possible.<br class=""></blockquote><br class=""><br class="">I would recommend at least 128MB of space on /dev/shm, 256MB if<br class="">you can<br class="">spare it. My 'standard' system uses 75MB under normal running<br class="">allowing<br class="">for one command-line query to run.<br class=""><br class="">If I read this right then you're reproducing a bare-metal system in<br class="">containers now? so the original systems will have a default /dev/shm<br class="">size which is probably much larger than your containers?<br class=""><br class="">I'm just checking here that we don't have a regression in memory<br class="">usage<br class="">as Poki suggested.<br class=""><br class="">Chrissie<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On 25 Jun 2018, at 21:18, Jan Pokorný <<a href="mailto:jpokorny@redhat.com" class="">jpokorny@redhat.com</a><br class=""><<a href="mailto:jpokorny@redhat.com" class="">mailto:jpokorny@redhat.com</a>><br class=""><<a href="mailto:jpokorny@redhat.com" class="">mailto:jpokorny@redhat.com</a>><br class=""><<a href="mailto:jpokorny@redhat.com" class="">mailto:jpokorny@redhat.com</a>>> wrote:<br class=""><br class="">On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:<br class=""><blockquote type="cite" class="">Thanks for reply. I scratched my cluster and created it again and<br class="">then migrated as before. This time I uninstalled pacemaker,<br class="">corosync, crmsh and resource agents with make uninstall<br class=""><br class="">then I installed new packages. The problem is the same, when<br class="">I launch:<br class="">corosync-quorumtool -ps<br class=""><br class="">I got: Cannot initialize QUORUM service<br class=""><br class="">Here the log with debug enabled:<br class=""><br class=""><br class="">[18019] pg3 corosyncerror [QB ] couldn't create circular mmap<br class="">on /dev/shm/qb-cfg-event-18020-18028-23-data<br class="">[18019] pg3 corosyncerror [QB ]<br class="">qb_rb_open:cfg-event-18020-18028-23: Resource temporarily<br class="">unavailable (11)<br class="">[18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer:<br class="">/dev/shm/qb-cfg-request-18020-18028-23-header<br class="">[18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer:<br class="">/dev/shm/qb-cfg-response-18020-18028-23-header<br class="">[18019] pg3 corosyncerror [QB ] shm connection FAILED:<br class="">Resource temporarily unavailable (11)<br class="">[18019] pg3 corosyncerror [QB ] Error in connection setup<br class="">(18020-18028-23): Resource temporarily unavailable (11)<br class=""><br class="">I tried to check /dev/shm and I am not sure these are the right<br class="">commands, however:<br class=""><br class="">df -h /dev/shm<br class="">Filesystem Size Used Avail Use% Mounted on<br class="">shm 64M 16M 49M 24% /dev/shm<br class=""><br class="">ls /dev/shm<br class="">qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data<br class=""> qb-quorum-request-18020-18095-32-data<br class="">qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header<br class=""> qb-quorum-request-18020-18095-32-header<br class=""><br class="">Is 64 Mb enough for /dev/shm. If no, why it worked with previous<br class="">corosync release?<br class=""></blockquote><br class="">For a start, can you try configuring corosync with<br class="">--enable-small-memory-footprint switch?<br class=""><br class="">Hard to say why the space provisioned to /dev/shm is the direct<br class="">opposite of generous (per today's standards), but may be the result<br class="">of automatic HW adaptation, and if RAM is so scarce in your case,<br class="">the above build-time toggle might help.<br class=""><br class="">If not, then exponentially increasing size of /dev/shm space is<br class="">likely your best bet (I don't recommended fiddling with mlockall()<br class="">and similar measures in corosync).<br class=""><br class="">Of course, feel free to raise a regression if you have a<br class="">reproducible<br class="">comparison between two corosync (plus possibly different libraries<br class="">like libqb) versions, one that works and one that won't, in<br class="">reproducible conditions (like this small /dev/shm, VM image, etc.).<br class=""><br class="">-- <br class="">Jan (Poki)<br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class=""><http://www.clusterlabs.org/><br class=""><http://www.clusterlabs.org/><br class="">Getting<br class="">started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org<br class=""><http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/><br class=""></blockquote><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class=""><http://www.clusterlabs.org/><br class=""><http://www.clusterlabs.org/> <http://www.clusterlabs.org/><br class="">Getting<br class="">started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/><br class=""><http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/><br class=""><br class=""></blockquote><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class=""><http://www.clusterlabs.org/><br class=""><http://www.clusterlabs.org/> <http://www.clusterlabs.org/><br class="">Getting<br class="">started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/><br class=""><http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/><br class=""></blockquote><br class=""><br class=""><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/><br class="">Getting<br class="">started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/><br class=""><br class=""></blockquote><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/><br class="">Getting started:<br class="">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/><br class=""></blockquote><br class=""></blockquote><br class=""><br class=""><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class="">Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org<br class=""><br class=""></blockquote><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a> <<a href="mailto:Users@clusterlabs.org" class="">mailto:Users@clusterlabs.org</a>><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class="">Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org<br class=""></blockquote><br class=""><br class=""><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class="">Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org<br class=""><br class=""></blockquote><br class="">_______________________________________________<br class="">Users mailing list: <a href="mailto:Users@clusterlabs.org" class="">Users@clusterlabs.org</a><br class=""><a href="https://lists.clusterlabs.org/mailman/listinfo/users" class="">https://lists.clusterlabs.org/mailman/listinfo/users</a><br class=""><br class="">Project Home: http://www.clusterlabs.org<br class="">Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br class="">Bugs: http://bugs.clusterlabs.org<br class=""></div></div></blockquote></div><br class=""></body></html>