<div dir="ltr">Hi Guys,<div><br></div><div>We are facing this issue again and again. Fail count is not being reset to zero and bec of this some of the resources are not being started on any node. Can some one plz tell what might be the cause.</div><div>Help is appreciated :-)</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 23, 2015 at 11:06 AM, Pritam Kharat <span dir="ltr"><<a href="mailto:pritam.kharat@oneconvergence.com" target="_blank">pritam.kharat@oneconvergence.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Could some one please reply ?</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 19, 2015 at 10:28 PM, Pritam Kharat <span dir="ltr"><<a href="mailto:pritam.kharat@oneconvergence.com" target="_blank">pritam.kharat@oneconvergence.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div>Hi All,<div><br></div><div>I have 2 node HA setup. I have added migration_threshold=5 and failure-timeout=120s for my resources. When migration threshold is reached to 5 resources are migrated to other node. But once observed fail-count is not reset back to zero after 2 mins. The setup was in the same state almost for 3 hours but still fail-count did not reset to zero.</div><div><br></div><div>Then I tried the same test again but could not reproduce this.When compared the logs of success scenario with failed scenario found that pengine did not take action to clear failcount.</div><div><br></div><div><br></div><div><br></div><div>Success logs</div><div><b><span style="font-size:13px">Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: Clearing expired failcount for oc-service-mana</span><span style="font-size:13px">ger on sc-node-1</span></b><br style="font-size:13px"><span style="font-size:13px">Nov 19 15:27:08 [16409] sc-node-1 pengine: info: get_failcount_f</span><span style="font-size:13px">ull: oc-service-mana</span><span style="font-size:13px">ger has failed 5 times on sc-node-1</span><br style="font-size:13px"><span style="font-size:13px">Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: Clearing expired failcount for oc-service-mana</span><span style="font-size:13px">ger on sc-node-1</span><br style="font-size:13px"><span style="font-size:13px">Nov 19 15:27:08 [16409] sc-node-1 pengine: notice: unpack_rsc_op: Re-initiated expired calculated failure oc-service-mana</span><span style="font-size:13px">ger_last_failur</span><span style="font-size:13px">e_0 (rc=7, magic=0:7;3:145</span><span style="font-size:13px">:0:258ae879-832</span><span style="font-size:13px">f-4126-a7d7-e57</span><span style="font-size:13px">bd3fdcdb1) on sc-node-1</span><br style="font-size:13px"><span style="font-size:13px">4:58 PM</span><br></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">Failure logs</span></div><div><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: warning: unpack_rsc_op: Processing failed op monitor for oc-service-mana</span><span style="font-size:13px">ger on sc-HA1: not running (7)</span><br style="font-size:13px"><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: info: native_print: oc-service-mana</span><span style="font-size:13px">ger (upstart:oc-ser</span><span style="font-size:13px">vice-manager): Started sc-HA2</span><br style="font-size:13px"><b><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: info: get_failcount_f</span><span style="font-size:13px">ull: oc-service-mana</span><span style="font-size:13px">ger has failed 5 times on sc-HA1</span></b><br style="font-size:13px"><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: warning: common_apply_st</span><span style="font-size:13px">ickiness: Forcing oc-service-mana</span><span style="font-size:13px">ger away from sc-HA1 after 5 failures (max=5)</span><br style="font-size:13px"><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: info: rsc_merge_weigh</span><span style="font-size:13px">ts: oc-service-mana</span><span style="font-size:13px">ger: Rolling back scores from oc-fw-agent</span><br style="font-size:13px"><span style="font-size:13px">Nov 04 22:23:39 [6831] sc-HA2 pengine: info: LogActions: Leave oc-service-mana</span><span style="font-size:13px">ger (Started sc-HA2)</span><span style="font-size:13px"><br></span></div><div><div><br></div><div><br></div><div>What might be the reason of - in failure case this action did not take place ?</div><div><b><span style="font-size:13px">notice: unpack_rsc_op: Clearing expired failcount for oc-service-mana</span><span style="font-size:13px">ger </span></b><span><font color="#888888"><br></font></span></div><span><font color="#888888"><div><b><span style="font-size:13px"><br></span></b></div><div><b><span style="font-size:13px"><br></span></b></div>-- <br><div>Thanks and Regards,<br></div><div>Pritam Kharat.<br></div>
</font></span></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Thanks and Regards,<br>Pritam Kharat.<br></div>
</div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Thanks and Regards,<br>Pritam Kharat.<br></div>
</div>