<div dir="ltr"><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Currently, Pacemaker supports the "failure-timeout" resource meta-attribute, which will automatically clear a resource's failure history once it has no new failures in that much time. </span><br><div><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Yea, find this explanation in Redhat docs.</span></div><div><span style="color:rgb(37,37,37);font-family:"Red Hat Text",RedHatText,"Helvetica Neue",Arial,sans-serif;font-size:16px">Thanks.</span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 25, 2021 at 10:07 PM <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, 2021-06-25 at 14:41 +0800, luckydog xf wrote:<br>

> 1. deleted recorded failures.<br>

> crm_failcount -V -D -r nova-compute -N remote-db8-ca-3a-69-50-34 -n<br>

> monitor -I 10000<br>

> <br>

> 2. cleanup resource status<br>

> crm resource cleanup nova-compute remote-db8-ca-3a-69-50-34 force<br>

> <br>

> Problem resolved. <br>

> <br>

>  But I don't know why these failed records are still there after the<br>

> resource is running.<br>

<br>

The failure displays are a history. The most recent failure is shown<br>

until the administrator can view and investigate, then run cleanup<br>

manually.<br>

<br>

There is also a failure-timeout resource option to have failures get<br>

cleaned up automatically after a certain amount of time with no<br>

failures.<br>

<br>

> On Wed, Jun 23, 2021 at 5:13 PM luckydog xf <<a href="mailto:luckydogxf@gmail.com" target="_blank">luckydogxf@gmail.com</a>><br>

> wrote:<br>

> > hello, guys,<br>

> > <br>

> > I built  an openstack cluster with  pacemaker, all nova-compute<br>

> > nodes are running. Yet <br>

> > `crm_mon -1r` shows only a nova-compute service is wrong<br>

> > ---<br>

> > Failed Actions:<br>

> > * nova-compute_monitor_10000 on remote-db8-ca-3a-69-50-34 'not<br>

> > running' (7): call=719373, status=complete, exitreason='none',<br>

> >     last-rc-change='Mon Mar  1 20:27:35 2021', queued=0ms, exec=0ms<br>

> > <br>

> > ---<br>

> > It's a false alarm, nova-compute is running on that node, and<br>

> > started by pacemaker-remote.<br>

> > <br>

> > # /var/log/pacemaker.log<br>

> > attrd[4085]:   notice: Update error (unknown peer uuid, retry will<br>

> > be attempted once uuid is discovered).<br>

> > <br>

> > So what's the root cause? My pacemaker is 1.1.16.<br>

> <br>

> _______________________________________________<br>

> Manage your subscription:<br>

> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

> <br>

> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

-- <br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

<br>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

</blockquote></div>