<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 5/29/21 12:05 AM, Strahil Nikolov
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1607468630.724688.1622239546135@mail.yahoo.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
Most RA scripts are writen in bash.
<div id="yMail_cursorElementTracker_1622239249817">Usually you can
change the shebang to '!#/usr/bin/bash -x' or you can set
trace_ra=1 via 'pcs resource update RESOURCE trace_ra=1
trace_file=/somepath'.</div>
<div id="yMail_cursorElementTracker_1622239377845"><br>
</div>
<div id="yMail_cursorElementTracker_1622239378556">If you don't
define trace_file, it should create them in
/var/lib/heartbeat/trace_ra (based on memory -> so use
find/locate).</div>
<div id="yMail_cursorElementTracker_1622239515230"><br>
</div>
<div id="yMail_cursorElementTracker_1622239515437">Best Regards,</div>
<div id="yMail_cursorElementTracker_1622239520757">Strahil Nikolov</div>
<div id="yMail_cursorElementTracker_1622239524008"><br>
</div>
<div id="yMail_cursorElementTracker_1622239513238">
<blockquote style="margin: 0 0 20px 0;">
<div style="font-family:Roboto, sans-serif; color:#6D00F6;">
<div>On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy</div>
<div><a class="moz-txt-link-rfc2396E" href="mailto:Abithan.Kumarasamy@ibm.com"><Abithan.Kumarasamy@ibm.com></a> wrote:</div>
</div>
<div style="padding: 10px 0 0 20px; margin: 10px 0 0 0;
border-left: 1px solid #6D00F6;">
<div id="yiv2500575883">
<div class="yiv2500575883socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt;">
<div dir="ltr">
<div style="font-size:medium;">Hello Team,</div>
<div style="font-size:medium;"> </div>
<div style="font-size:medium;">We have been recently
running some tests on our Pacemaker clusters that
involve two Pacemaker resources on two nodes
respectively. The test case in which we are
experiencing intermittent problems is one in which
we bring down the Pacemaker resources on both nodes
simultaneously. Now our expected behaviour is that
our monitor function in our resource agent script
detects the downtime, and then should issue a start
command. This happens on most successful iterations
of our test case. However, on some iterations
(approximately 1 out of 30 simulations) we notice
that Pacemaker is issuing the start command on only
one of the hosts. On the troubled host the monitor
function is logging that the resource is down as
expected and is exiting with OCF_ERR_GENERIC return
code (1) . According to the documentation, this
should perform a soft disaster recovery, but when
scanning the Pacemaker logs, there is no indication
of the start command being issued or invoked.
However, it works as expected on the other host. </div>
<div style="font-size:medium;"> </div>
<div style="font-size:medium;">To summarize the issue:</div>
<ol>
<li><span style="font-size:12pt;">The resource’s
monitor is running and returning OCF_ERR_GENERIC</span></li>
<li><span style="font-size:12pt;">The constraints we
have for the resources are satisfied.</span></li>
<li><span style="font-size:12pt;">There are no
visible differences in the Pacemaker logs
between the test iteration that failed, and the
multiple successful iterations, other than the
fact that Pacemaker does not start the resource
after the monitor returns OCF_ERR_GENERIC</span><br>
</li>
</ol>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
In general pacemaker won't start a resource after receiving<br>
OCF_ERR_GENERIC from the monitor. As you already mentioned<br>
it will try to recover the resource to a known state by first<br>
trying to stop and the state has to be reported as stopped<br>
after that. Just then it will try to restart if rules say so.<br>
Which Resource Agent are you using? If you brought down<br>
the resource manually it shouldn't report OCF_ERR_GENERIC<br>
but stopped.<br>
<br>
Regards,<br>
Klaus<br>
<blockquote type="cite"
cite="mid:1607468630.724688.1622239546135@mail.yahoo.com">
<div id="yMail_cursorElementTracker_1622239513238">
<blockquote style="margin: 0 0 20px 0;">
<div style="padding: 10px 0 0 20px; margin: 10px 0 0 0;
border-left: 1px solid #6D00F6;">
<div id="yiv2500575883">
<div class="yiv2500575883socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt;">
<div dir="ltr">
<ol>
<li> </li>
</ol>
<div style="font-size:medium;">Could you provide some
more insight into why this may be happening and how
we can further debug this issue? We are currently
relying on Pacemaker logs, but are there additional
diagnostics to further debug?<br>
</div>
<div style="font-size:medium;"> </div>
<div style="font-size:medium;">Thanks,</div>
<div style="font-size:medium;">Abithan</div>
</div>
</div>
<br>
</div>
_______________________________________________<br>
Manage your subscription:<br>
<a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
target="_blank" moz-do-not-send="true">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/"
target="_blank" moz-do-not-send="true">https://www.clusterlabs.org/</a><br>
</div>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Manage your subscription:
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>
ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a>
</pre>
</blockquote>
<br>
</body>
</html>