<div dir="ltr">Just in case, this is the monitor function from the resource agent:<div><div>ra_monitor() {</div><div># ocf_log info "$RA: [monitor]"</div><div> systemctl status ${service}</div><div> rc=$?</div><div> if [ "$rc" -eq "0" ]; then</div><div> return $OCF_SUCCESS</div><div> fi</div><div><br></div><div> ocf_log warn "$RA: [monitor] : got rc=$rc"</div><div> return $OCF_NOT_RUNNING</div><div>}</div></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">Thank you,<div>Kostia</div></div></div></div></div></div>
<br><div class="gmail_quote">On Tue, Jan 19, 2016 at 6:30 PM, Kostiantyn Ponomarenko <span dir="ltr"><<a href="mailto:konstantin.ponomarenko@gmail.com" target="_blank">konstantin.ponomarenko@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>The resource that wasn't running, but was reported as running, is "adminServer".</div><div><br></div><div>Here are a brief chronological description:</div><div> </div><div>[Jan 19 23:42:16] The first time Pacemaker triggers its monitor function at line #1107. (those lines are from its Resource Agent)</div><div>[Jan 19 23:42:16] Then Pacemaker starts the resource - line #1191.</div><div>[Jan 19 11:42:53] The first failure is reported by monitor operation at line #1543.</div><div>[Jan 19 11:42:53] The fail-count is set, but I don't see any attempt from Pacemaker to "start" the resource - the start function is not called (from the logs) - line #1553.</div><div>[Jan 19 12:27:56] Then adminServer's monitor operation keeps returning $OCF_NOT_RUNNING - starts at line #1860.</div><div>[Jan 19 12:57:53] Then the expired failcount is cleared at line #1969.</div><div>[Jan 19 12:57:53] Another call of the monitor function happens at line #2038.</div><div>[Jan 19 12:57:53] I assume that the line #2046 means "not running" (?). </div><div>[Jan 19 12:57:53] The "stop" function is called - line #2150</div><div>[Jan 19 12:57:53] The "start" function is called and the resource is successfully started - line #2164<br><br></div><div><br></div><div>The time change occurred while cluster was starting, I see this from "journalctl --since="2016-01-19" --until="2016-01-20"":</div><div><br></div><div>Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step -43193.793349 s</div><div>Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode</div><div>Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed</div><div><br></div><div>I am attaching corosync.log.</div></div><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><div><div dir="ltr">Thank you,<div>Kostia</div></div></div></div></div></div><div><div class="h5">
<br><div class="gmail_quote">On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <span dir="ltr"><<a href="mailto:bdobrelia@mirantis.com" target="_blank">bdobrelia@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 19.01.2016 16:13, Ken Gaillot wrote:<br>
> On 01/19/2016 06:49 AM, Kostiantyn Ponomarenko wrote:<br>
>> One of resources in my cluster is not actually running, but "crm_mon" shows<br>
>> it with the "Started" status.<br>
>> Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but<br>
>> Pacemaker doesn't react on this anyhow - crm_mon show the resource as<br>
>> Started.<br>
>> I couldn't find an explanation to this behavior, so I suppose it is a bug,<br>
>> is it?<br>
><br>
> That is unexpected. Can you post the configuration and logs from around<br>
> the time of the issue?<br>
><br>
<br>
</span>Oh, sorry, I forgot to mention the related thread [0]. That is exactly<br>
the case I reported there. Looks same, so I thought you've just updated<br>
my thread :)<br>
<br>
These may be merged perhaps.<br>
<br>
[0] <a href="http://clusterlabs.org/pipermail/users/2016-January/002035.html" rel="noreferrer" target="_blank">http://clusterlabs.org/pipermail/users/2016-January/002035.html</a><br>
<span><br>
><br>
> _______________________________________________<br>
> Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
> <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
<br>
<br>
</span><span>--<br>
Best regards,<br>
Bogdan Dobrelya,<br>
Irc #bogdando<br>
<br>
</span><div><div>_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>