[ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Fri May 20 13:02:58 CEST 2016
Le Fri, 20 May 2016 11:12:28 +0200,
"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> a écrit :
> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 20.05.2016 um
> 09:59 in
> Nachricht <20160520095934.029c1822 at firost>:
> > Le Fri, 20 May 2016 08:39:42 +0200,
> > "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> a écrit :
> >
> >> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 19.05.2016 um
> >> >>> 21:29 in
> >> Nachricht <20160519212947.6cc0fd7b at firost>:
> >> [...]
> >> > I was thinking of a use case where a graceful demote or stop action
> failed
> >> > multiple times and to give a chance to the RA to choose another method to
>
> >> > stop
> >> > the resource before it requires a migration. As instance, PostgreSQL has
> 3
> >> > different kind of stop, the last one being not graceful, but still better
>
> >> > than
> >> > a kill -9.
> >>
> >> For example the Xen RA tries a clean shutdown with a timeout of about 2/3
> of
> >> the timeout; it it fails it shuts the VM down the hard way.
> >
> > Reading the Xen RA, I see they added a shutdown timeout escalation
> > parameter.
>
> Not quite:
> if [ -n "$OCF_RESKEY_shutdown_timeout" ]; then
> timeout=$OCF_RESKEY_shutdown_timeout
> elif [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
> # Allow 2/3 of the action timeout for the orderly shutdown
> # (The origin unit is ms, hence the conversion)
> timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
> else
> timeout=60
> fi
>
> > This is a reasonable solution, but isn't it possible to get the action
> > timeout
> > directly? I looked for such information in the past with no success.
>
> See above.
Gosh, this is embarrassing...how could we miss that?
Thank you for pointing this!
More information about the Users
mailing list