[ClusterLabs] Antw: Re: Resource won't start, crm_resource -Y does not help

Ken Gaillot kgaillot at redhat.com
Mon Jul 22 12:14:59 EDT 2019


On Mon, 2019-07-22 at 15:45 +0200, Ulrich Windl wrote:
> Hi!
> 
> My RA actually sends OCF_ERR_ARGS if checking the args detects a
> problem.
> But as the error can be resolved sometimes without changing the args
> (eg.
> providing some resource by other means), I suspect CRM does not
> handle that
> properly. Even after a resource cleanup.
> 
> My RA logs any parameter check, and I can see that no parameter check
> is being
> performed...
> 
> I also noticed that the "invalid parameter" persists on a node even
> after
> restarting pacemaker on that node.

Pacemaker treats OCF_ERR_ARGS as a "hard" failure, meaning it won't be
retried on the same node. But it should attempt to start on any other
eligible nodes.

The failure should be cleared by either cleanup or pacemaker restart.
That's the mystery here. I can't even imagine how it would be possible
to survive a pacemaker restart -- are you sure it wasn't simply a new
attempt getting the same result?

> 
> So:
> # crm_resource -r prm_idredir_test -VV start
>  warning: unpack_rsc_op_failure:        Processing failed start of
> prm_idredir_test on h02: invalid parameter | rc=2
> 
> (Start was not even tried)
> 
> Eventually I was able to start the resource. Some other process had a
> socket
> address in use my resource needed...

Since you control the RA, you might want to set exit reasons, which
will be shown in the status display (the exitreason='' in your output
below). There's an ocf_exit_reason convenience function, e.g.

   ocf_exit_reason "Some other process has the socket address in use"
   exit $OCF_ERR_ARGS

> 
> Regards,
> Ulrich
> 
> 
> > > > Oyvind Albrigtsen <oalbrigt at redhat.com> schrieb am 22.07.2019
> > > > um 14:09 in
> 
> Nachricht <20190722120911.uz4cgybmxsced3n6 at redhat.com>:
> > Sounds like your RA returns e.g. OCF_ERR_ARGS or similar where it
> > shouldnt.
> > 
> > Try starting the resource with crm_resource and add ‑VV which
> > should
> > show you the code as it's being run.
> > 
> > On 22/07/19 13:55 +0200, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > Playing with some new RA that won't start, I found this in
> > > crm_resource's 
> > 
> > man:
> > >       ‑Y, ‑‑why
> > >              Show why
> > > resources  are  not  running,  optionally  filtered 
> > by
> > >              ‑‑resource and/or ‑‑node
> > > 
> > > When I tried it, all I got was:
> > > # crm_resource ‑r prm_idredir_test ‑Y
> > > Resource prm_idredir_test is not running
> > > 
> > > 8‑( Not quite what I expected. ANothger command gave me:
> > > 
> > > prm_idredir_test_start_0 on h02 'invalid parameter' (2):
> > > call=76, 
> > 
> > status=complete, exitreason='', last‑rc‑change='Mon Jul 22 13:06:45
> > 2019', 
> > queued=0ms, exec=27ms
> > > 
> > > Unfortunately the last resource cleanup was significantly_after_
> > > the time 
> > 
> > logged above, and it seems the CRM did not even tre‑try to start
> > the RA.
> > > 
> > > Could this be a bug in SLES12 SP4 
> > 
> > (pacemaker‑1.1.19+20181105.ccd6b5b10‑3.10.1.x86_64)?
> > > 
> > > Regards,
> > > Ulrich
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list