<div dir="ltr"><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Currently, Pacemaker supports the "failure-timeout" resource meta-attribute, which will automatically clear a resource's failure history once it has no new failures in that much time. </span><br><div><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Yea, find this explanation in Redhat docs.</span></div><div><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Thanks.</span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 25, 2021 at 10:07 PM <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, 2021-06-25 at 14:41 +0800, luckydog xf wrote:<br>
> 1. deleted recorded failures.<br>
> crm_failcount -V -D -r nova-compute -N remote-db8-ca-3a-69-50-34 -n<br>
> monitor -I 10000<br>
> <br>
> 2. cleanup resource status<br>
> crm resource cleanup nova-compute remote-db8-ca-3a-69-50-34 force<br>
> <br>
> Problem resolved. <br>
> <br>
> But I don't know why these failed records are still there after the<br>
> resource is running.<br>
<br>
The failure displays are a history. The most recent failure is shown<br>
until the administrator can view and investigate, then run cleanup<br>
manually.<br>
<br>
There is also a failure-timeout resource option to have failures get<br>
cleaned up automatically after a certain amount of time with no<br>
failures.<br>
<br>
> On Wed, Jun 23, 2021 at 5:13 PM luckydog xf <<a href="mailto:luckydogxf@gmail.com" target="_blank">luckydogxf@gmail.com</a>><br>
> wrote:<br>
> > hello, guys,<br>
> > <br>
> > I built an openstack cluster with pacemaker, all nova-compute<br>
> > nodes are running. Yet <br>
> > `crm_mon -1r` shows only a nova-compute service is wrong<br>
> > ---<br>
> > Failed Actions:<br>
> > * nova-compute_monitor_10000 on remote-db8-ca-3a-69-50-34 'not<br>
> > running' (7): call=719373, status=complete, exitreason='none',<br>
> > last-rc-change='Mon Mar 1 20:27:35 2021', queued=0ms, exec=0ms<br>
> > <br>
> > ---<br>
> > It's a false alarm, nova-compute is running on that node, and<br>
> > started by pacemaker-remote.<br>
> > <br>
> > # /var/log/pacemaker.log<br>
> > attrd[4085]: notice: Update error (unknown peer uuid, retry will<br>
> > be attempted once uuid is discovered).<br>
> > <br>
> > So what's the root cause? My pacemaker is 1.1.16.<br>
> <br>
> _______________________________________________<br>
> Manage your subscription:<br>
> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
> <br>
> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
-- <br>
Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>
<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote></div>