<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 23, 2024 at 10:34 AM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 23, 2024 at 9:53 AM NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div lang="FR"><div><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Other strange thing.<u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">On RHEL 7, corosync is restarted while the “Restart=on-failure » line is commented.<u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I think also that something changed in the pacemaker behavior, or somewhere else.</span></p></div></div></div></blockquote><div><br></div><div>That is how it was working before introduction of the reconnection to corosync.</div><div>Previously pacemaker would fail and systemd would restart it checking the services</div><div>pacemaker depends on. And finding corosync not running it would be restarted.</div></div></div></blockquote><div><br></div><div>From what I've read there has been a change in how systemd is handling restart</div><div>of dependent services a while back as well. So changed behavior can come from</div><div>that as well. Just for completeness ...</div><div><br></div><div>Klaus </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>Klaus</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div lang="FR"><div><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u><u></u></span></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>> <br><b>Envoyé :</b> lundi 22 avril 2024 12:41<br><b>À :</b> NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>><br><b>Cc :</b> Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix<u></u><u></u></span></p><p class="MsoNormal"><u></u> <u></u></p><div><div><p class="MsoNormal"><u></u> <u></u></p></div><p class="MsoNormal"><u></u> <u></u></p><div><div><p class="MsoNormal">On Mon, Apr 22, 2024 at 12:32 PM NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>> wrote:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">You are right : the “Restart=on-failure” line is commented and so, disabled per default.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Uncommenting it resolves my issue.</span><u></u><u></u></p></div></div></div></blockquote><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">Maybe pacemaker changed behavior here without syncing enough with corosync behavior.<u></u><u></u></p></div><div><p class="MsoNormal">We'll look into that to see which approach is better - restart corosync on failure - or have<u></u><u></u></p></div><div><p class="MsoNormal">pacemaker be restarted by systemd which should in turn restart corosync as well.<u></u><u></u></p></div><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">Klaus <u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Thanks a lot.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Christophe.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">> <br><b>Envoyé :</b> lundi 22 avril 2024 11:06<br><b>À :</b> NOLIBOS Christophe <christophe</span><span style="font-size:11pt;font-family:Calibri,sans-serif">.</span><a href="mailto:nolibos@thalesgroup.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">nolibos@thalesgroup.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Cc :</b> Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal"> <u></u><u></u></p></div><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal">On Mon, Apr 22, 2024 at 9:51 AM NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>> wrote:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">‘kill -9’ command.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Is it gracefully exit?</span><u></u><u></u></p></div></div></div></blockquote><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Looking as if corosync-unit-file has Restart=on-failure disabled per default.<u></u><u></u></p></div><div><p class="MsoNormal">I'm not aware of another mechanism that would restart corosync and I<u></u><u></u></p></div><div><p class="MsoNormal">think default behavior is not to restart.<u></u><u></u></p></div><div><p class="MsoNormal">Comments suggest just to enable if using watchdog but that might just<u></u><u></u></p></div><div><p class="MsoNormal">reference the RestartSec to provoke a watchdog-reboot instead of a<u></u><u></u></p></div><div><p class="MsoNormal">restart via systemd.<u></u><u></u></p></div><div><p class="MsoNormal">Any signal that isn't handled by the process - so that the exit-code could<u></u><u></u></p></div><div><p class="MsoNormal">be set to 0 - should be fine.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Klaus<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">> <br><b>Envoyé :</b> jeudi 18 avril 2024 20:17<br><b>À :</b> NOLIBOS Christophe <</span><a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">christophe.nolibos@thalesgroup.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Cc :</b> Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal" style="margin-bottom:12pt"> <u></u><u></u></p><div><div><p class="MsoNormal">NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>> schrieb am Do., 18. Apr. 2024, 19:01:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Hummm… my RHEL 8.8 OS has been hardened.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I am wondering if the problem does not come from that.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">On another side, I get the same issue (i.e. corosync not restarted by system) with Pacemaker 2.1.5-8 deployed on RHEL 8.4 (not hardened).</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I’m checking.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p></div></div></blockquote></div></div><div><p class="MsoNormal">How did, you kill corosync? If it exits gracefully might not be restarted. Check journal. Sry cant try am on my mobile ATM. Klaus<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><div><div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Users <</span><a href="mailto:users-bounces@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users-bounces@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">> <b>De la part de</b> NOLIBOS Christophe via Users<br><b>Envoyé :</b> jeudi 18 avril 2024 18:34<br><b>À :</b> Klaus Wenninger <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">>; Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Cc :</b> NOLIBOS Christophe <</span><a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">christophe.nolibos@thalesgroup.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p></div></div><p class="MsoNormal"> <u></u><u></u></p><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">So, the issue is on systemd?</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">If I run the same test on RHEL 7 (3.10.0-693.11.1.el7) with pacemaker 1.1.13-10, corosync is correctly restarted by systemd.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">[RHEL7 ~]# journalctl -f</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">-- Logs begin at Wed 2024-01-03 13:15:41 UTC. --</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: corosync.service failed.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: pacemaker.service holdoff time over, scheduling restart.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: Starting Corosync Cluster Engine...</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - corosync[12179]: Starting Corosync Cluster Engine (corosync): [ OK ]</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: Started Corosync Cluster Engine.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: Started Pacemaker High Availability Cluster Manager.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - systemd[1]: Starting Pacemaker High Availability Cluster Manager...</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - pacemakerd[12192]: notice: Additional logging available in /var/log/pacemaker.log</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - pacemakerd[12192]: notice: Switching to /var/log/cluster/corosync.log</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 16:26:55 - pacemakerd[12192]: notice: Additional logging available in /var/log/cluster/corosync.log</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">> <br><b>Envoyé :</b> jeudi 18 avril 2024 18:12<br><b>À :</b> NOLIBOS Christophe <</span><a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">christophe.nolibos@thalesgroup.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">>; Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal"> <u></u><u></u></p></div><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal" style="margin-left:57.75pt">On Thu, Apr 18, 2024 at 6:09 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>> wrote:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><p class="MsoNormal"> <u></u><u></u></p></div><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal" style="margin-left:80.85pt">On Thu, Apr 18, 2024 at 6:06 PM NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>> wrote:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Well… why do you say that « </span><span lang="EN-US">Well if corosync isn't there that this is to be expected and pacemaker won't recover corosync.”?</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">In my mind, Corosync is managed by Pacemaker as any other cluster resource and the "pacemakerd: recover properly from > Corosync crash" fix implemented in version 2.1.2 seems confirm that.</span><u></u><u></u></p></div></div></div></blockquote><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Nope. Startup of the stack is done by systemd. And pacemaker is just started after corosync is up and<u></u><u></u></p></div><div><p class="MsoNormal">systemd should be responsible for keeping the stack up.<u></u><u></u></p></div><div><p class="MsoNormal">For completeness: if you have sbd in the mix that is as well being started by systemd but kind of<u></u><u></u></p></div><div><p class="MsoNormal">parallel with corosync as part of it (systemd terminology).<u></u><u></u></p></div></div></div></blockquote><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">The "recover" above is referring to pacemaker recovering from corosync going away and coming back.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Klaus <u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><div><div><div><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p><div><div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> NOLIBOS Christophe <br><b>Envoyé :</b> jeudi 18 avril 2024 17:56<br><b>À :</b> 'Klaus Wenninger' <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">>; Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Cc :</b> Ken Gaillot <</span><a href="mailto:kgaillot@redhat.com" target="_blank"><span style="font-size:11pt;font-family:Calibri,sans-serif">kgaillot@redhat.com</span></a><span style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> RE: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p></div></div><p class="MsoNormal"> <u></u><u></u></p><p style="margin:0cm 0cm 0.0001pt"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">Classified as: {OPEN}</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">[~]$ systemctl status corosync</span><u></u><u></u></p><p class="MsoNormal" style="margin-left:161.7pt"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">● corosync.service - Corosync Cluster Engine</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> Active: failed (Result: signal) since Thu 2024-04-18 14:58:42 UTC; 53min ago</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> Docs: man:corosync</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> man:corosync.conf</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> man:corosync_overview</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> Process: 2027251 ExecStop=/usr/sbin/corosync-cfgtool -H --force (code=exited, status=0/SUCCESS)</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> Process: 1324906 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=killed, signal=KILL)</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Main PID: 1324906 (code=killed, signal=KILL)</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [QUORUM] Sync joined[1]: 1</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [TOTEM ] A new membership (1.1c8) was formed. Members joined: 1</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [QUORUM] Members[1]: 1</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - corosync[1324906]: [MAIN ] Completed service synchronization, ready to provide service.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 13:16:04 - systemd[1]: Started Corosync Cluster Engine.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 14:58:42 - systemd[1]: corosync.service: Main process exited, code=killed, status=9/KILL</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Apr 18 14:58:42 - systemd[1]: corosync.service: Failed with result 'signal'.</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">[~]$</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"> </span><u></u><u></u></p><p class="MsoNormal"><b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">De :</span></b><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <</span><a href="mailto:kwenning@redhat.com" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">kwenning@redhat.com</span></a><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">> <br><b>Envoyé :</b> jeudi 18 avril 2024 17:43<br><b>À :</b> Cluster Labs - All topics related to open-source clustering welcomed <</span><a href="mailto:users@clusterlabs.org" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">users@clusterlabs.org</span></a><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Cc :</b> Ken Gaillot <</span><a href="mailto:kgaillot@redhat.com" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">kgaillot@redhat.com</span></a><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">>; NOLIBOS Christophe <</span><a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">christophe.nolibos@thalesgroup.com</span></a><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif">><br><b>Objet :</b> Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix</span><u></u><u></u></p><p class="MsoNormal"><span lang="EN-US"> </span><u></u><u></u></p><div><div><p class="MsoNormal"><span lang="EN-US"> </span><u></u><u></u></p></div><p class="MsoNormal"><span lang="EN-US"> </span><u></u><u></u></p><div><div><p class="MsoNormal" style="margin-left:184.8pt">On Thu, Apr 18, 2024 at 5:07 PM NOLIBOS Christophe via Users <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>> wrote:<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><p class="MsoNormal" style="margin-bottom:12pt">Classified as: {OPEN}<br><br>I'm using RedHat 8.8 (4.18.0-477.21.1.el8_8.x86_64).<br>When I kill Corosync, no new corosync process is created and pacemaker is in failure.<br>The only solution is to restart the pacemaker service.<br><br>[~]$ pcs status<br>Error: unable to get cib<br>[~]$<br><br>[~]$systemctl status pacemaker<br>● pacemaker.service - Pacemaker High Availability Cluster Manager<br> Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)<br> Active: active (running) since Thu 2024-04-18 13:16:04 UTC; 1h 43min ago<br> Docs: man:pacemakerd<br> <a href="https://clusterlabs.org/pacemaker/doc/" target="_blank">https://clusterlabs.org/pacemaker/doc/</a><br> Main PID: 1324923 (pacemakerd)<br> Tasks: 91<br> Memory: 132.1M<br> CGroup: /system.slice/pacemaker.service<br>...<br>Apr 18 14:59:02 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:03 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:04 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:05 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:06 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:07 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:08 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:09 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:10 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>Apr 18 14:59:11 - pacemakerd[1324923]: crit: Could not connect to Corosync CFG: CS_ERR_LIBRARY<br>[~]$<u></u><u></u></p></blockquote><div><p class="MsoNormal">Well if corosync isn't there that this is to be expected and pacemaker won't recover corosync.<u></u><u></u></p></div><div><p class="MsoNormal">Can you check what systemd thinks about corosync (status/journal). <u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Klaus<u></u><u></u></p></div><blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt"><p class="MsoNormal"><br>{OPEN}<br><br>-----Message d'origine-----<br>De : Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>> <br>Envoyé : jeudi 18 avril 2024 16:40<br>À : Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>Cc : NOLIBOS Christophe <<a href="mailto:christophe.nolibos@thalesgroup.com" target="_blank">christophe.nolibos@thalesgroup.com</a>><br>Objet : Re: [ClusterLabs] "pacemakerd: recover properly from Corosync crash" fix<br><br>What OS are you using? Does it use systemd?<br><br>What does happen when you kill Corosync?<br><br>On Thu, 2024-04-18 at 13:13 +0000, NOLIBOS Christophe via Users wrote:<br>> Classified as: {OPEN}<br>> <br>> Dear All,<br>> <br>> I have a question about the "pacemakerd: recover properly from <br>> Corosync crash" fix implemented in version 2.1.2.<br>> I have observed the issue when testing pacemaker version 2.0.5, just <br>> by killing the ‘corosync’ process: Corosync was not recovered.<br>> <br>> I am using now pacemaker version 2.1.5-8.<br>> Doing the same test, I have the same result: Corosync is still not <br>> recovered.<br>> <br>> Please confirm the "pacemakerd: recover properly from Corosync crash"<br>> fix implemented in version 2.1.2 covers this scenario.<br>> If it is, did I miss something in the configuration of my cluster?<br>> <br>> Best Regard.<br>> <br>> Christophe.<br>> <br>> <br>> <br>> {OPEN}<br>> _______________________________________________<br>> Manage your subscription:<br>> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>> <br>> ClusterLabs home: <a href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br>--<br>Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>_______________________________________________<br>Manage your subscription:<br><a href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br><br>ClusterLabs home: <a href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p align="center" style="margin:0cm 0cm 0.0001pt;text-align:center"><span style="font-size:10pt;font-family:Calibri,sans-serif;color:black">{OPEN}</span><u></u><u></u></p></blockquote></div></div></div></div></div></blockquote></div></div></blockquote></div></div></div></div></blockquote></div></div></div></div></div></div></blockquote></div></div></div></div></div></blockquote></div></div></div></div></div></blockquote></div></div>
</blockquote></div></div>