[ClusterLabs] nova-compute_monitor_10000 on 'node-xxx ' not running

kgaillot at redhat.com kgaillot at redhat.com
Fri Jun 25 10:07:14 EDT 2021


On Fri, 2021-06-25 at 14:41 +0800, luckydog xf wrote:
> 1. deleted recorded failures.
> crm_failcount -V -D -r nova-compute -N remote-db8-ca-3a-69-50-34 -n
> monitor -I 10000
> 
> 2. cleanup resource status
> crm resource cleanup nova-compute remote-db8-ca-3a-69-50-34 force
> 
> Problem resolved. 
> 
>  But I don't know why these failed records are still there after the
> resource is running.

The failure displays are a history. The most recent failure is shown
until the administrator can view and investigate, then run cleanup
manually.

There is also a failure-timeout resource option to have failures get
cleaned up automatically after a certain amount of time with no
failures.

> On Wed, Jun 23, 2021 at 5:13 PM luckydog xf <luckydogxf at gmail.com>
> wrote:
> > hello, guys,
> > 
> > I built  an openstack cluster with  pacemaker, all nova-compute
> > nodes are running. Yet 
> > `crm_mon -1r` shows only a nova-compute service is wrong
> > ---
> > Failed Actions:
> > * nova-compute_monitor_10000 on remote-db8-ca-3a-69-50-34 'not
> > running' (7): call=719373, status=complete, exitreason='none',
> >     last-rc-change='Mon Mar  1 20:27:35 2021', queued=0ms, exec=0ms
> > 
> > ---
> > It's a false alarm, nova-compute is running on that node, and
> > started by pacemaker-remote.
> > 
> > # /var/log/pacemaker.log
> > attrd[4085]:   notice: Update error (unknown peer uuid, retry will
> > be attempted once uuid is discovered).
> > 
> > So what's the root cause? My pacemaker is 1.1.16.
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list