<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:2.0cm 42.5pt 2.0cm 3.0cm;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=RU link=blue vlink="#954F72"><div class=WordSection1><p class=MsoNormal><span lang=EN-US>Hi!!!<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>DRBD under guest Debian 10 on host Hyper-V 2012 R2 has a panic kernel error!!!<o:p></o:p></span></p><p class=MsoNormal>Therefore, I decided not to use this solution. </p><p class=MsoNormal>The question is closed. Thanks for the <span lang=EN-US>support</span>.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US>Elias Nasonov</span><br>elias@po-mayak</p><p class=MsoNormal><o:p> </o:p></p><div style='mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal style='border:none;padding:0cm'><b>От: </b><a href="mailto:users-request@clusterlabs.org">users-request@clusterlabs.org</a><br><b>Отправлено: </b>20 ноября 2019 г. в 22:00<br><b>Кому: </b><a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br><b>Тема: </b>Users Digest, Vol 58, Issue 22</p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Send Users mailing list submissions to</p><p class=MsoNormal> users@clusterlabs.org</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>To subscribe or unsubscribe via the World Wide Web, visit</p><p class=MsoNormal> https://lists.clusterlabs.org/mailman/listinfo/users</p><p class=MsoNormal>or, via email, send a message with subject or body 'help' to</p><p class=MsoNormal> users-request@clusterlabs.org</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>You can reach the person managing the list at</p><p class=MsoNormal> users-owner@clusterlabs.org</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>When replying, please edit your Subject line so it is more specific</p><p class=MsoNormal>than "Re: Contents of Users digest..."</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Today's Topics:</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal> 1. Antw: HA - order lost when group made (Ulrich Windl)</p><p class=MsoNormal> 2. Antw: Re: Dual Primary DRBD + OCFS2 (elias) (Ulrich Windl)</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>----------------------------------------------------------------------</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Message: 1</p><p class=MsoNormal>Date: Wed, 20 Nov 2019 12:23:49 +0100</p><p class=MsoNormal>From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de></p><p class=MsoNormal>To: <users@clusterlabs.org></p><p class=MsoNormal>Subject: [ClusterLabs] Antw: HA - order lost when group made</p><p class=MsoNormal>Message-ID: <5DD52245020000A100035471@gwsmtp.uni-regensburg.de></p><p class=MsoNormal>Content-Type: text/plain; charset=UTF-8</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>>>> "John Goutbeck" <John.Goutbeck@newsignal.ca> schrieb am 19.11.2019 um 22:19</p><p class=MsoNormal>in</p><p class=MsoNormal>Nachricht <5DD45C62020000F500006B80@mx1.newsignal.ca>:</p><p class=MsoNormal>> HA ? order lost when group made</p><p class=MsoNormal>> </p><p class=MsoNormal>> SLES 15 SP1 + HA</p><p class=MsoNormal>> </p><p class=MsoNormal>> nss?sn02:~ # rpm ?qa | grep pacem</p><p class=MsoNormal>> pacemaker?cli?1.1.18+20180430.b12c320f5?3.15.1.x86_64</p><p class=MsoNormal>> pacemaker?1.1.18+20180430.b12c320f5?3.15.1.x86_64</p><p class=MsoNormal>> libpacemaker3?1.1.18+20180430.b12c320f5?3.15.1.x86_64</p><p class=MsoNormal>> nss?sn02:~ # rpm ?qa | grep crm</p><p class=MsoNormal>> crmsh?scripts?4.1.0+git.1569593061.35f57072?3.14.1.noarch</p><p class=MsoNormal>> crmsh?4.1.0+git.1569593061.35f57072?3.14.1.noarch</p><p class=MsoNormal>> </p><p class=MsoNormal>> 2 node HA cluster setup for DRBD storage</p><p class=MsoNormal>> </p><p class=MsoNormal>> Made an order constraint for resource virtual IPs, iSCSI targets and iSCSI </p><p class=MsoNormal>> LUs.</p><p class=MsoNormal>> </p><p class=MsoNormal>> These resources need to be started in order</p><p class=MsoNormal>> Te resources can be start individually (and stopped individually) (before </p><p class=MsoNormal>> order constraint is made)</p><p class=MsoNormal>> </p><p class=MsoNormal>> order o_drbd02_before_iscsitgt02 Serialize: p?ip?14?202:start</p><p class=MsoNormal>p?ip?15?202:start </p><p class=MsoNormal>> p_target_drbd02:start p?lu?drbd02:start</p><p class=MsoNormal>> or</p><p class=MsoNormal>> order o_drbd02_before_iscsitgt02 Serialize: ( p?ip?14?202:start</p><p class=MsoNormal>p?ip?15?202:start </p><p class=MsoNormal>> ) ( p_target_drbd02:start ) ( p?lu?drbd02:start )</p><p class=MsoNormal>> </p><p class=MsoNormal>> ?</p><p class=MsoNormal>> </p><p class=MsoNormal>> Now to make a group resource with the same resources, but when the group is</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> made, the order constraint is gone</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>groups always had implicit colocation and ordering.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> </p><p class=MsoNormal>> group g?drbd02 p?ip?14?202 p?ip?15?202 p?lu?drbd02 p_target_drbd02 meta </p><p class=MsoNormal>> target?role=Stopped</p><p class=MsoNormal>> </p><p class=MsoNormal>> Adding the group with 'crm configure edit' returns these comments</p><p class=MsoNormal>> </p><p class=MsoNormal>> nss?sn02:~ # crm configure edit</p><p class=MsoNormal>> INFO: modified colocation:cl?drbd02 from p?ip?14?202 to g?drbd02</p><p class=MsoNormal>> INFO: modified order:o_drbd03_before_iscsitgt from p?ip?14?202 to g?drbd02</p><p class=MsoNormal>> INFO: modified colocation:cl?drbd03 from p?ip?14?202 to g?drbd02</p><p class=MsoNormal>> INFO: modified order:o_drbd02_before_iscsitgt02 from p?ip?14?202 to</p><p class=MsoNormal>g?drbd02</p><p class=MsoNormal>> INFO: modified order:o_drbd02_before_iscsitgt02 from p?ip?15?202 to</p><p class=MsoNormal>g?drbd02</p><p class=MsoNormal>> INFO: modified order:o_drbd02_before_iscsitgt02 from p?lu?drbd02 to</p><p class=MsoNormal>g?drbd02</p><p class=MsoNormal>> INFO: modified order:o_drbd02_before_iscsitgt02 from p_target_drbd02 to </p><p class=MsoNormal>> g?drbd02</p><p class=MsoNormal>> </p><p class=MsoNormal>> How can a order be made of the same group resources? </p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>------------------------------</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Message: 2</p><p class=MsoNormal>Date: Wed, 20 Nov 2019 12:29:57 +0100</p><p class=MsoNormal>From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de></p><p class=MsoNormal>To: <users@clusterlabs.org></p><p class=MsoNormal>Subject: [ClusterLabs] Antw: Re: Dual Primary DRBD + OCFS2 (elias)</p><p class=MsoNormal>Message-ID: <5DD523B5020000A100035475@gwsmtp.uni-regensburg.de></p><p class=MsoNormal>Content-Type: text/plain; charset=UTF-8</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Maybe show what you did. Did DLM start successfully?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>>>> ???? ??????? <elias@po-mayak.ru> schrieb am 20.11.2019 um 06:12 in</p><p class=MsoNormal>Nachricht</p><p class=MsoNormal><20191120051305.052936005F7@iwtm.local>:</p><p class=MsoNormal>> Thanks Roger!</p><p class=MsoNormal>> </p><p class=MsoNormal>> I configured according to the SUSE doc for OCFS2, but DLM resource stop with</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> error -107 (no interface found).</p><p class=MsoNormal>> I think it is necessary to configure the OCFS2 cluster manually, but </p><p class=MsoNormal>> correctly do it through the RA Pacemaker.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Ilya Nasonov</p><p class=MsoNormal>> elias@po-mayak</p><p class=MsoNormal>> </p><p class=MsoNormal>> ??: users-request@clusterlabs.org </p><p class=MsoNormal>> ??????????: 19 ?????? 2019 ?. ? 19:32</p><p class=MsoNormal>> ????: users@clusterlabs.org </p><p class=MsoNormal>> ????: Users Digest, Vol 58, Issue 20</p><p class=MsoNormal>> </p><p class=MsoNormal>> Send Users mailing list submissions to</p><p class=MsoNormal>> users@clusterlabs.org </p><p class=MsoNormal>> </p><p class=MsoNormal>> To subscribe or unsubscribe via the World Wide Web, visit</p><p class=MsoNormal>> https://lists.clusterlabs.org/mailman/listinfo/users </p><p class=MsoNormal>> or, via email, send a message with subject or body 'help' to</p><p class=MsoNormal>> users-request@clusterlabs.org </p><p class=MsoNormal>> </p><p class=MsoNormal>> You can reach the person managing the list at</p><p class=MsoNormal>> users-owner@clusterlabs.org </p><p class=MsoNormal>> </p><p class=MsoNormal>> When replying, please edit your Subject line so it is more specific</p><p class=MsoNormal>> than "Re: Contents of Users digest..."</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> Today's Topics:</p><p class=MsoNormal>> </p><p class=MsoNormal>> 1. Re: Antw: Re: Pacemaker 2.0.3-rc3 now available</p><p class=MsoNormal>> (Jehan-Guillaume de Rorthais)</p><p class=MsoNormal>> 2. corosync 3.0.1 on Debian/Buster reports some MTU errors</p><p class=MsoNormal>> (Jean-Francois Malouin)</p><p class=MsoNormal>> 3. Dual Primary DRBD + OCFS2 (???? ???????)</p><p class=MsoNormal>> 4. Re: Dual Primary DRBD + OCFS2 (Roger Zhou)</p><p class=MsoNormal>> 5. Q: ldirectord and "checktype = external-perl" broken?</p><p class=MsoNormal>> (Ulrich Windl)</p><p class=MsoNormal>> 6. Q: ocf:pacemaker:ping (Ulrich Windl)</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ----------------------------------------------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 1</p><p class=MsoNormal>> Date: Mon, 18 Nov 2019 18:13:57 +0100</p><p class=MsoNormal>> From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com></p><p class=MsoNormal>> To: Ken Gaillot <kgaillot@redhat.com></p><p class=MsoNormal>> Cc: Cluster Labs - All topics related to open-source clustering</p><p class=MsoNormal>> welcomed <users@clusterlabs.org></p><p class=MsoNormal>> Subject: Re: [ClusterLabs] Antw: Re: Pacemaker 2.0.3-rc3 now</p><p class=MsoNormal>> available</p><p class=MsoNormal>> Message-ID: <20191118181357.6899c051@firost></p><p class=MsoNormal>> Content-Type: text/plain; charset=UTF-8</p><p class=MsoNormal>> </p><p class=MsoNormal>> On Mon, 18 Nov 2019 10:45:25 -0600</p><p class=MsoNormal>> Ken Gaillot <kgaillot@redhat.com> wrote:</p><p class=MsoNormal>> </p><p class=MsoNormal>>> On Fri, 2019-11-15 at 14:35 +0100, Jehan-Guillaume de Rorthais wrote:</p><p class=MsoNormal>>> > On Thu, 14 Nov 2019 11:09:57 -0600</p><p class=MsoNormal>>> > Ken Gaillot <kgaillot@redhat.com> wrote:</p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > > On Thu, 2019-11-14 at 15:22 +0100, Ulrich Windl wrote: </p><p class=MsoNormal>>> > > > > > > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> schrieb am</p><p class=MsoNormal>>> > > > > > > 14.11.2019 um </p><p class=MsoNormal>>> > > > </p><p class=MsoNormal>>> > > > 15:17 in</p><p class=MsoNormal>>> > > > Nachricht <20191114151719.6cbf4e38@firost>: </p><p class=MsoNormal>>> > > > > On Wed, 13 Nov 2019 17:30:31 ?0600</p><p class=MsoNormal>>> > > > > Ken Gaillot <kgaillot@redhat.com> wrote:</p><p class=MsoNormal>>> > > > > ... </p><p class=MsoNormal>>> > > > > > A longstanding pain point in the logs has been improved.</p><p class=MsoNormal>>> > > > > > Whenever</p><p class=MsoNormal>>> > > > > > the</p><p class=MsoNormal>>> > > > > > scheduler processes resource history, it logs a warning for</p><p class=MsoNormal>>> > > > > > any</p><p class=MsoNormal>>> > > > > > failures it finds, regardless of whether they are new or old,</p><p class=MsoNormal>>> > > > > > which can</p><p class=MsoNormal>>> > > > > > confuse anyone reading the logs. Now, the log will contain</p><p class=MsoNormal>>> > > > > > the</p><p class=MsoNormal>>> > > > > > time of</p><p class=MsoNormal>>> > > > > > the failure, so it's obvious whether you're seeing the same</p><p class=MsoNormal>>> > > > > > event</p><p class=MsoNormal>>> > > > > > or</p><p class=MsoNormal>>> > > > > > not. The log will also contain the exit reason if one was</p><p class=MsoNormal>>> > > > > > provided by</p><p class=MsoNormal>>> > > > > > the resource agent, for easier troubleshooting. </p><p class=MsoNormal>>> > > > > </p><p class=MsoNormal>>> > > > > I've been hurt by this in the past and I was wondering what was</p><p class=MsoNormal>>> > > > > the</p><p class=MsoNormal>>> > > > > point of</p><p class=MsoNormal>>> > > > > warning again and again in the logs for past failures during</p><p class=MsoNormal>>> > > > > scheduling? </p><p class=MsoNormal>>> > > > > What this information brings to the administrator? </p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > The controller will log an event just once, when it happens.</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > The scheduler, on the other hand, uses the entire recorded resource</p><p class=MsoNormal>>> > > history to determine the current resource state. Old failures (that</p><p class=MsoNormal>>> > > haven't been cleaned) must be taken into account. </p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > OK, I wasn't aware of this. If you have a few minutes, I would be</p><p class=MsoNormal>>> > interested to</p><p class=MsoNormal>>> > know why the full history is needed and not just find the latest</p><p class=MsoNormal>>> > entry from</p><p class=MsoNormal>>> > there. Or maybe there's some comments in the source code that already</p><p class=MsoNormal>>> > cover this question? </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> The full *recorded* history consists of the most recent operation that</p><p class=MsoNormal>>> affects the state (like start/stop/promote/demote), the most recent</p><p class=MsoNormal>>> failed operation, and the most recent results of any recurring</p><p class=MsoNormal>>> monitors.</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> For example there may be a failed monitor, but whether the resource is</p><p class=MsoNormal>>> considered failed or not would depend on whether there was a more</p><p class=MsoNormal>>> recent successful stop or start. Even if the failed monitor has been</p><p class=MsoNormal>>> superseded, it needs to stay in the history for display purposes until</p><p class=MsoNormal>>> the user has cleaned it up.</p><p class=MsoNormal>> </p><p class=MsoNormal>> OK, understood.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Maybe that's why "FAILED" appears shortly in crm_mon during a resource move</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> on</p><p class=MsoNormal>> a clean resource, but with past failures? Maybe I should dig this weird</p><p class=MsoNormal>> behavior and wrap up a bug report if I confirm this?</p><p class=MsoNormal>> </p><p class=MsoNormal>>> > > Every run of the scheduler is completely independent, so it doesn't</p><p class=MsoNormal>>> > > know about any earlier runs or what they logged. Think of it like</p><p class=MsoNormal>>> > > Frosty the Snowman saying "Happy Birthday!" every time his hat is</p><p class=MsoNormal>>> > > put</p><p class=MsoNormal>>> > > on. </p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > I don't have this ref :) </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> I figured not everybody would, but it was too fun to pass up :)</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> The snowman comes to life every time his magic hat is put on, but to</p><p class=MsoNormal>>> him each time feels like he's being born for the first time, so he says</p><p class=MsoNormal>>> "Happy Birthday!"</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> https://www.youtube.com/watch?v=1PbWTEYoN8o </p><p class=MsoNormal>> </p><p class=MsoNormal>> heh :)</p><p class=MsoNormal>> </p><p class=MsoNormal>>> > > As far as each run is concerned, it is the first time it's seen the</p><p class=MsoNormal>>> > > history. This is what allows the DC role to move from node to node,</p><p class=MsoNormal>>> > > and</p><p class=MsoNormal>>> > > the scheduler to be run as a simulation using a saved CIB file.</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > We could change the wording further if necessary. The previous</p><p class=MsoNormal>>> > > version</p><p class=MsoNormal>>> > > would log something like:</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > warning: Processing failed monitor of my-rsc on node1: not running</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > and this latest change will log it like:</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > warning: Unexpected result (not running: No process state file</p><p class=MsoNormal>>> > > found)</p><p class=MsoNormal>>> > > was recorded for monitor of my-rsc on node1 at Nov 12 19:19:02 2019 </p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > /result/state/ ? </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> It's the result of a resource agent action, so it could be for example</p><p class=MsoNormal>>> a timeout or a permissions issue.</p><p class=MsoNormal>> </p><p class=MsoNormal>> ok</p><p class=MsoNormal>> </p><p class=MsoNormal>>> > > I wanted to be explicit about the message being about processing</p><p class=MsoNormal>>> > > resource history that may or may not be the first time it's been</p><p class=MsoNormal>>> > > processed and logged, but everything I came up with seemed too long</p><p class=MsoNormal>>> > > for</p><p class=MsoNormal>>> > > a log line. Another possibility might be something like:</p><p class=MsoNormal>>> > > </p><p class=MsoNormal>>> > > warning: Using my-rsc history to determine its current state on</p><p class=MsoNormal>>> > > node1:</p><p class=MsoNormal>>> > > Unexpected result (not running: No process state file found) was</p><p class=MsoNormal>>> > > recorded for monitor at Nov 12 19:19:02 2019 </p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > I better like the first one.</p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > However, it feels like implementation details exposed to the world,</p><p class=MsoNormal>>> > isn't it? How useful is this information for the end user? What the</p><p class=MsoNormal>>> > user can do</p><p class=MsoNormal>>> > with this information? There's noting to fix and this is not actually</p><p class=MsoNormal>>> > an error</p><p class=MsoNormal>>> > of the current running process.</p><p class=MsoNormal>>> > </p><p class=MsoNormal>>> > I still fail to understand why the scheduler doesn't process the</p><p class=MsoNormal>>> > history</p><p class=MsoNormal>>> > silently, whatever it finds there, then warn for something really</p><p class=MsoNormal>>> > important if</p><p class=MsoNormal>>> > the final result is not expected... </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> From the scheduler's point of view, it's all relevant information that</p><p class=MsoNormal>>> goes into the decision making. Even an old failure can cause new</p><p class=MsoNormal>>> actions, for example if quorum was not held at the time but has now</p><p class=MsoNormal>>> been reached, or if there is a failure-timeout that just expired. So</p><p class=MsoNormal>>> any failure history is important to understanding whatever the</p><p class=MsoNormal>>> scheduler says needs to be done.</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> Also, the scheduler is run on the DC, which is not necessarily the node</p><p class=MsoNormal>>> that executed the action. So it's useful for troubleshooting to present</p><p class=MsoNormal>>> a picture of the whole cluster on the DC, rather than just what's the</p><p class=MsoNormal>>> situation on the local node.</p><p class=MsoNormal>> </p><p class=MsoNormal>> OK, kind of got it. The scheduler need to summarize the chain of event to</p><p class=MsoNormal>> define the state of a resource based on the last event.</p><p class=MsoNormal>> </p><p class=MsoNormal>>> I could see an argument for lowering it from warning to notice, but</p><p class=MsoNormal>>> it's a balance between what's most useful during normal operation and</p><p class=MsoNormal>>> what's most useful during troubleshooting.</p><p class=MsoNormal>> </p><p class=MsoNormal>> So in my humble opinion, the messages should definitely be at notice level.</p><p class=MsoNormal>> Maybe they should even go to debug level. I never had to troubleshoot a bad</p><p class=MsoNormal>> decision from the scheduler because of a bad state summary.</p><p class=MsoNormal>> Moreover, if needed, the admin can still study the history from cib backed </p><p class=MsoNormal>> up</p><p class=MsoNormal>> on disk, isn't it?</p><p class=MsoNormal>> </p><p class=MsoNormal>> The alternative would be to spit the event chain in details only if the </p><p class=MsoNormal>> result</p><p class=MsoNormal>> of the summary is different from what the scheduler was expecting?</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 2</p><p class=MsoNormal>> Date: Mon, 18 Nov 2019 16:31:34 -0500</p><p class=MsoNormal>> From: Jean-Francois Malouin <Jean-Francois.Malouin@bic.mni.mcgill.ca></p><p class=MsoNormal>> To: The Pacemaker Cluster List <users@clusterlabs.org></p><p class=MsoNormal>> Subject: [ClusterLabs] corosync 3.0.1 on Debian/Buster reports some</p><p class=MsoNormal>> MTU errors</p><p class=MsoNormal>> Message-ID: <20191118213134.huecj2xnbtrtdqmm@bic.mni.mcgill.ca></p><p class=MsoNormal>> Content-Type: text/plain; charset=us-ascii</p><p class=MsoNormal>> </p><p class=MsoNormal>> Hi,</p><p class=MsoNormal>> </p><p class=MsoNormal>> Maybe not directly a pacemaker question but maybe some of you have seen</p><p class=MsoNormal>this</p><p class=MsoNormal>> problem:</p><p class=MsoNormal>> </p><p class=MsoNormal>> A 2 node pacemaker cluster running corosync-3.0.1 with dual communication </p><p class=MsoNormal>> ring</p><p class=MsoNormal>> sometimes reports errors like this in the corosync log file:</p><p class=MsoNormal>> </p><p class=MsoNormal>> [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366</p><p class=MsoNormal>> [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366</p><p class=MsoNormal>> [KNET ] pmtud: Global data MTU changed to: 1366</p><p class=MsoNormal>> [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> run-time</p><p class=MsoNormal>> [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> run-time</p><p class=MsoNormal>> </p><p class=MsoNormal>> Those do not happen very frequenly, once a week or so...</p><p class=MsoNormal>> </p><p class=MsoNormal>> However the system log on the nodes reports those much more frequently, a </p><p class=MsoNormal>> few</p><p class=MsoNormal>> times a day:</p><p class=MsoNormal>> </p><p class=MsoNormal>> Nov 17 23:26:20 node1 corosync[2258]: [KNET ] link: host: 2 link: 1 is </p><p class=MsoNormal>> down</p><p class=MsoNormal>> Nov 17 23:26:20 node1 corosync[2258]: [KNET ] host: host: 2 (passive) </p><p class=MsoNormal>> best link: 0 (pri: 0)</p><p class=MsoNormal>> Nov 17 23:26:26 node1 corosync[2258]: [KNET ] rx: host: 2 link: 1 is up</p><p class=MsoNormal>> Nov 17 23:26:26 node1 corosync[2258]: [KNET ] host: host: 2 (passive) </p><p class=MsoNormal>> best link: 1 (pri: 1)</p><p class=MsoNormal>> </p><p class=MsoNormal>> Are those to be dismissed or are they indicative of a network </p><p class=MsoNormal>> misconfig/problem?</p><p class=MsoNormal>> I tried setting 'knet_transport: udpu' in the totem section (the default </p><p class=MsoNormal>> value)</p><p class=MsoNormal>> but it didn't seem to make a difference...Hard coding netmtu to 1500 and</p><p class=MsoNormal>> allowing for longer (10s) token timeout also didn't seem to affect the </p><p class=MsoNormal>> issue.</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> Corosync config follows:</p><p class=MsoNormal>> </p><p class=MsoNormal>> /etc/corosync/corosync.conf</p><p class=MsoNormal>> </p><p class=MsoNormal>> totem {</p><p class=MsoNormal>> version: 2</p><p class=MsoNormal>> cluster_name: bicha</p><p class=MsoNormal>> transport: knet</p><p class=MsoNormal>> link_mode: passive</p><p class=MsoNormal>> ip_version: ipv4</p><p class=MsoNormal>> token: 10000</p><p class=MsoNormal>> netmtu: 1500</p><p class=MsoNormal>> knet_transport: sctp</p><p class=MsoNormal>> crypto_model: openssl</p><p class=MsoNormal>> crypto_hash: sha256</p><p class=MsoNormal>> crypto_cipher: aes256</p><p class=MsoNormal>> keyfile: /etc/corosync/authkey</p><p class=MsoNormal>> interface {</p><p class=MsoNormal>> linknumber: 0</p><p class=MsoNormal>> knet_transport: udp</p><p class=MsoNormal>> knet_link_priority: 0</p><p class=MsoNormal>> }</p><p class=MsoNormal>> interface {</p><p class=MsoNormal>> linknumber: 1</p><p class=MsoNormal>> knet_transport: udp</p><p class=MsoNormal>> knet_link_priority: 1</p><p class=MsoNormal>> }</p><p class=MsoNormal>> }</p><p class=MsoNormal>> quorum {</p><p class=MsoNormal>> provider: corosync_votequorum</p><p class=MsoNormal>> two_node: 1</p><p class=MsoNormal>> # expected_votes: 2</p><p class=MsoNormal>> }</p><p class=MsoNormal>> nodelist {</p><p class=MsoNormal>> node {</p><p class=MsoNormal>> ring0_addr: xxx.xxx.xxx.xxx</p><p class=MsoNormal>> ring1_addr: zzz.zzz.zzz.zzx</p><p class=MsoNormal>> name: node1</p><p class=MsoNormal>> nodeid: 1</p><p class=MsoNormal>> } </p><p class=MsoNormal>> node {</p><p class=MsoNormal>> ring0_addr: xxx.xxx.xxx.xxy</p><p class=MsoNormal>> ring1_addr: zzz.zzz.zzz.zzy</p><p class=MsoNormal>> name: node2</p><p class=MsoNormal>> nodeid: 2</p><p class=MsoNormal>> } </p><p class=MsoNormal>> }</p><p class=MsoNormal>> logging {</p><p class=MsoNormal>> to_logfile: yes</p><p class=MsoNormal>> to_syslog: yes</p><p class=MsoNormal>> logfile: /var/log/corosync/corosync.log</p><p class=MsoNormal>> syslog_facility: daemon</p><p class=MsoNormal>> debug: off</p><p class=MsoNormal>> timestamp: on</p><p class=MsoNormal>> logger_subsys {</p><p class=MsoNormal>> subsys: QUORUM</p><p class=MsoNormal>> debug: off</p><p class=MsoNormal>> }</p><p class=MsoNormal>> }</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 3</p><p class=MsoNormal>> Date: Tue, 19 Nov 2019 13:51:59 +0500</p><p class=MsoNormal>> From: ???? ??????? <elias@po-mayak.ru></p><p class=MsoNormal>> To: " users@clusterlabs.org" <users@clusterlabs.org></p><p class=MsoNormal>> Subject: [ClusterLabs] Dual Primary DRBD + OCFS2</p><p class=MsoNormal>> Message-ID: <20191119085203.2771960014A@iwtm.local></p><p class=MsoNormal>> Content-Type: text/plain; charset="utf-8"</p><p class=MsoNormal>> </p><p class=MsoNormal>> Hello!</p><p class=MsoNormal>> </p><p class=MsoNormal>> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.</p><p class=MsoNormal>> I heard the opinion that OCFS2 file system is better. Found an old cluster </p><p class=MsoNormal>> setup description: </p><p class=MsoNormal>> https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 </p><p class=MsoNormal>> but as I understand it, o2cb Service is not supported Pacemaker on Debian.</p><p class=MsoNormal>> Where can I get the latest information on setting up the OCFS2.</p><p class=MsoNormal>> </p><p class=MsoNormal>> ? ?????????,</p><p class=MsoNormal>> ???? ???????</p><p class=MsoNormal>> elias@po-mayak</p><p class=MsoNormal>> </p><p class=MsoNormal>> -------------- next part --------------</p><p class=MsoNormal>> An HTML attachment was scrubbed...</p><p class=MsoNormal>> URL: </p><p class=MsoNormal>><o:p> </o:p></p><p class=MsoNormal><https://lists.clusterlabs.org/pipermail/users/attachments/20191119/95e4c791/</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> attachment-0001.html></p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 4</p><p class=MsoNormal>> Date: Tue, 19 Nov 2019 10:01:01 +0000</p><p class=MsoNormal>> From: Roger Zhou <ZZhou@suse.com></p><p class=MsoNormal>> To: "users@clusterlabs.org" <users@clusterlabs.org></p><p class=MsoNormal>> Subject: Re: [ClusterLabs] Dual Primary DRBD + OCFS2</p><p class=MsoNormal>> Message-ID: <572e29b1-4c05-a985-7419-462310d1c626@suse.com></p><p class=MsoNormal>> Content-Type: text/plain; charset="utf-8"</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> On 11/19/19 4:51 PM, ???? ??????? wrote:</p><p class=MsoNormal>>> Hello!</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> I heard the opinion that OCFS2 file system is better. Found an old </p><p class=MsoNormal>>> cluster setup </p><p class=MsoNormal>>> description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> but as I understand it, o2cb Service is not supported Pacemaker on Debian.</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> Where can I get the latest information on setting up the OCFS2.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Probably you can refer to SUSE doc for OCFS2 with Pacemaker [1]. Should </p><p class=MsoNormal>> be not much different to adapt to Debian, I feel.</p><p class=MsoNormal>> </p><p class=MsoNormal>> [1] </p><p class=MsoNormal>> https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-ocfs2.ht</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> ml</p><p class=MsoNormal>> </p><p class=MsoNormal>> Cheers,</p><p class=MsoNormal>> Roger</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> ? ?????????,</p><p class=MsoNormal>>> ???? ???????</p><p class=MsoNormal>>> elias@po-mayak</p><p class=MsoNormal>>> </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> _______________________________________________</p><p class=MsoNormal>>> Manage your subscription:</p><p class=MsoNormal>>> https://lists.clusterlabs.org/mailman/listinfo/users </p><p class=MsoNormal>>> </p><p class=MsoNormal>>> ClusterLabs home: https://www.clusterlabs.org/ </p><p class=MsoNormal>>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 5</p><p class=MsoNormal>> Date: Tue, 19 Nov 2019 14:58:08 +0100</p><p class=MsoNormal>> From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de></p><p class=MsoNormal>> To: <users@clusterlabs.org></p><p class=MsoNormal>> Subject: [ClusterLabs] Q: ldirectord and "checktype = external-perl"</p><p class=MsoNormal>> broken?</p><p class=MsoNormal>> Message-ID: <5DD3F4F0020000A10003544E@gwsmtp.uni-regensburg.de></p><p class=MsoNormal>> Content-Type: text/plain; charset=US-ASCII</p><p class=MsoNormal>> </p><p class=MsoNormal>> Hi!</p><p class=MsoNormal>> </p><p class=MsoNormal>> In SLES11 I developed some special check program for ldirectord 3.9.5 in </p><p class=MsoNormal>> Perl, but then I discovered that it won't work correctly with "checktype = </p><p class=MsoNormal>> external-perl". Changing to "checktype = external" made it work.</p><p class=MsoNormal>> Today I played with it in SLES12 SP4 and </p><p class=MsoNormal>> ldirectord-4.3.018.a7fb5035-3.25.1.18557.0.PTF.1153889.x86_64, just to </p><p class=MsoNormal>> discover that it still does not work.</p><p class=MsoNormal>> </p><p class=MsoNormal>> So I wonder: Is it really broken all the time, or is there some special </p><p class=MsoNormal>> thing to consider that isn't written in the manual page?</p><p class=MsoNormal>> </p><p class=MsoNormal>> Th effec tobservable is that the weight is set to 0 right after starting </p><p class=MsoNormal>> with weight = 1. If it works, the weight is set to 1.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Regards,</p><p class=MsoNormal>> Ulrich</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Message: 6</p><p class=MsoNormal>> Date: Tue, 19 Nov 2019 15:32:43 +0100</p><p class=MsoNormal>> From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de></p><p class=MsoNormal>> To: <users@clusterlabs.org></p><p class=MsoNormal>> Subject: [ClusterLabs] Q: ocf:pacemaker:ping</p><p class=MsoNormal>> Message-ID: <5DD3FD0B020000A100035452@gwsmtp.uni-regensburg.de></p><p class=MsoNormal>> Content-Type: text/plain; charset=US-ASCII</p><p class=MsoNormal>> </p><p class=MsoNormal>> Hi!</p><p class=MsoNormal>> </p><p class=MsoNormal>> Seems today I'm digging out old stuff:</p><p class=MsoNormal>> I can remeber in 2011 that the documentation for ping's dampen was not very</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> help ful. I think it still is:</p><p class=MsoNormal>> </p><p class=MsoNormal>> (RA info)</p><p class=MsoNormal>> node connectivity (ocf:pacemaker:ping)</p><p class=MsoNormal>> </p><p class=MsoNormal>> Every time the monitor action is run, this resource agent records (in the </p><p class=MsoNormal>> CIB) the current number of nodes the host can connect to using the system </p><p class=MsoNormal>> fping (preferred) or ping tool.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Parameters (*: required, []: default):</p><p class=MsoNormal>> </p><p class=MsoNormal>> pidfile (string, [/var/run/ping-ping]):</p><p class=MsoNormal>> PID file</p><p class=MsoNormal>> </p><p class=MsoNormal>> dampen (integer, [5s]): Dampening interval</p><p class=MsoNormal>> The time to wait (dampening) further changes occur</p><p class=MsoNormal>> </p><p class=MsoNormal>> name (string, [pingd]): Attribute name</p><p class=MsoNormal>> The name of the attributes to set. This is the name to be used in the </p><p class=MsoNormal>> constraints.</p><p class=MsoNormal>> </p><p class=MsoNormal>> multiplier (integer, [1]): Value multiplier</p><p class=MsoNormal>> The number by which to multiply the number of connected ping nodes by</p><p class=MsoNormal>> </p><p class=MsoNormal>> host_list* (string): Host list</p><p class=MsoNormal>> A space separated list of ping nodes to count.</p><p class=MsoNormal>> </p><p class=MsoNormal>> attempts (integer, [3]): no. of ping attempts</p><p class=MsoNormal>> Number of ping attempts, per host, before declaring it dead</p><p class=MsoNormal>> </p><p class=MsoNormal>> timeout (integer, [2]): ping timeout in seconds</p><p class=MsoNormal>> How long, in seconds, to wait before declaring a ping lost</p><p class=MsoNormal>> </p><p class=MsoNormal>> options (string): Extra Options</p><p class=MsoNormal>> A catch all for any other options that need to be passed to ping.</p><p class=MsoNormal>> </p><p class=MsoNormal>> failure_score (integer):</p><p class=MsoNormal>> Resource is failed if the score is less than failure_score.</p><p class=MsoNormal>> Default never fails.</p><p class=MsoNormal>> </p><p class=MsoNormal>> use_fping (boolean, [1]): Use fping if available</p><p class=MsoNormal>> Use fping rather than ping, if found. If set to 0, fping</p><p class=MsoNormal>> will not be used even if present.</p><p class=MsoNormal>> </p><p class=MsoNormal>> debug (string, [false]): Verbose logging</p><p class=MsoNormal>> Enables to use default attrd_updater verbose logging on every call.</p><p class=MsoNormal>> </p><p class=MsoNormal>> Operations' defaults (advisory minimum):</p><p class=MsoNormal>> </p><p class=MsoNormal>> start timeout=60</p><p class=MsoNormal>> stop timeout=20</p><p class=MsoNormal>> monitor timeout=60 interval=10</p><p class=MsoNormal>> ---------</p><p class=MsoNormal>> </p><p class=MsoNormal>> "The name of the attributes to set.": Why plural ("attributes")?</p><p class=MsoNormal>> "The time to wait (dampening) further changes occur": Is this an English </p><p class=MsoNormal>> sentence at all?</p><p class=MsoNormal>> </p><p class=MsoNormal>> Regards,</p><p class=MsoNormal>> Ulrich</p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> Subject: Digest Footer</p><p class=MsoNormal>> </p><p class=MsoNormal>> _______________________________________________</p><p class=MsoNormal>> Manage your subscription:</p><p class=MsoNormal>> https://lists.clusterlabs.org/mailman/listinfo/users </p><p class=MsoNormal>> </p><p class=MsoNormal>> ClusterLabs home: https://www.clusterlabs.org/ </p><p class=MsoNormal>> </p><p class=MsoNormal>> ------------------------------</p><p class=MsoNormal>> </p><p class=MsoNormal>> End of Users Digest, Vol 58, Issue 20</p><p class=MsoNormal>> *************************************</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>------------------------------</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Subject: Digest Footer</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>_______________________________________________</p><p class=MsoNormal>Manage your subscription:</p><p class=MsoNormal>https://lists.clusterlabs.org/mailman/listinfo/users</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>ClusterLabs home: https://www.clusterlabs.org/</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>------------------------------</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>End of Users Digest, Vol 58, Issue 22</p><p class=MsoNormal>*************************************</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>