<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 5/29/21 12:05 AM, Strahil Nikolov
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:1607468630.724688.1622239546135@mail.yahoo.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      Most RA scripts are writen in bash.
      <div id="yMail_cursorElementTracker_1622239249817">Usually you can
        change the shebang to '!#/usr/bin/bash -x' or you can set
        trace_ra=1 via 'pcs resource update RESOURCE trace_ra=1
        trace_file=/somepath'.</div>
      <div id="yMail_cursorElementTracker_1622239377845"><br>
      </div>
      <div id="yMail_cursorElementTracker_1622239378556">If you don't
        define trace_file, it should create them in
        /var/lib/heartbeat/trace_ra (based on memory -> so use
        find/locate).</div>
      <div id="yMail_cursorElementTracker_1622239515230"><br>
      </div>
      <div id="yMail_cursorElementTracker_1622239515437">Best Regards,</div>
      <div id="yMail_cursorElementTracker_1622239520757">Strahil Nikolov</div>
      <div id="yMail_cursorElementTracker_1622239524008"><br>
      </div>
      <div id="yMail_cursorElementTracker_1622239513238">
        <blockquote style="margin: 0 0 20px 0;">
          <div style="font-family:Roboto, sans-serif; color:#6D00F6;">
            <div>On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy</div>
            <div><a class="moz-txt-link-rfc2396E" href="mailto:Abithan.Kumarasamy@ibm.com"><Abithan.Kumarasamy@ibm.com></a> wrote:</div>
          </div>
          <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0;
            border-left: 1px solid #6D00F6;">
            <div id="yiv2500575883">
              <div class="yiv2500575883socmaildefaultfont" dir="ltr"
                style="font-family:Arial, Helvetica,
                sans-serif;font-size:10pt;">
                <div dir="ltr">
                  <div style="font-size:medium;">Hello Team,</div>
                  <div style="font-size:medium;"> </div>
                  <div style="font-size:medium;">We have been recently
                    running some tests on our Pacemaker clusters that
                    involve two Pacemaker resources on two nodes
                    respectively. The test case in which we are
                    experiencing intermittent problems is one in which
                    we bring down the Pacemaker resources on both nodes
                    simultaneously. Now our expected behaviour is that
                    our monitor function in our resource agent script
                    detects the downtime, and then should issue a start
                    command. This happens on most successful iterations
                    of our test case. However, on some iterations
                    (approximately 1 out of 30 simulations) we notice
                    that Pacemaker is issuing the start command on only
                    one of the hosts. On the troubled host the monitor
                    function is logging that the resource is down as
                    expected and is exiting with OCF_ERR_GENERIC return
                    code (1) . According to the documentation, this
                    should perform a soft disaster recovery, but when
                    scanning the Pacemaker logs, there is no indication
                    of the start command being issued or invoked.
                    However, it works as expected on the other host. </div>
                  <div style="font-size:medium;"> </div>
                  <div style="font-size:medium;">To summarize the issue:</div>
                  <ol>
                    <li><span style="font-size:12pt;">The resource’s
                        monitor is running and returning OCF_ERR_GENERIC</span></li>
                    <li><span style="font-size:12pt;">The constraints we
                        have for the resources are satisfied.</span></li>
                    <li><span style="font-size:12pt;">There are no
                        visible differences in the Pacemaker logs
                        between the test iteration that failed, and the
                        multiple successful iterations, other than the
                        fact that Pacemaker does not start the resource
                        after the monitor returns OCF_ERR_GENERIC</span><br>
                    </li>
                  </ol>
                </div>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    In general pacemaker won't start a resource after receiving<br>
    OCF_ERR_GENERIC from the monitor. As you already mentioned<br>
    it will try to recover the resource to a known state by first<br>
    trying to stop and the state has to be reported as stopped<br>
    after that. Just then it will try to restart if rules say so.<br>
    Which Resource Agent are you using? If you brought down<br>
    the resource manually it shouldn't report OCF_ERR_GENERIC<br>
    but stopped.<br>
    <br>
    Regards,<br>
    Klaus<br>
    <blockquote type="cite"
      cite="mid:1607468630.724688.1622239546135@mail.yahoo.com">
      <div id="yMail_cursorElementTracker_1622239513238">
        <blockquote style="margin: 0 0 20px 0;">
          <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0;
            border-left: 1px solid #6D00F6;">
            <div id="yiv2500575883">
              <div class="yiv2500575883socmaildefaultfont" dir="ltr"
                style="font-family:Arial, Helvetica,
                sans-serif;font-size:10pt;">
                <div dir="ltr">
                  <ol>
                    <li>  </li>
                  </ol>
                  <div style="font-size:medium;">Could you provide some
                    more insight into why this may be happening and how
                    we can further debug this issue? We are currently
                    relying on Pacemaker logs, but are there additional
                    diagnostics to further debug?<br>
                     </div>
                  <div style="font-size:medium;"> </div>
                  <div style="font-size:medium;">Thanks,</div>
                  <div style="font-size:medium;">Abithan</div>
                </div>
              </div>
              <br>
            </div>
            _______________________________________________<br>
            Manage your subscription:<br>
            <a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              target="_blank" moz-do-not-send="true">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
            <br>
            ClusterLabs home: <a href="https://www.clusterlabs.org/"
              target="_blank" moz-do-not-send="true">https://www.clusterlabs.org/</a><br>
          </div>
        </blockquote>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Manage your subscription:
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>