[ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

Fri May 20 09:59:34 CEST 2016

Le Fri, 20 May 2016 08:39:42 +0200,
"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> a écrit :

> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 19.05.2016 um
> >>> 21:29 in
> Nachricht <20160519212947.6cc0fd7b at firost>:
> [...]
> > I was thinking of a use case where a graceful demote or stop action failed
> > multiple times and to give a chance to the RA to choose another method to 
> > stop
> > the resource before it requires a migration. As instance, PostgreSQL has 3
> > different kind of stop, the last one being not graceful, but still better 
> > than
> > a kill -9.
> 
> For example the Xen RA tries a clean shutdown with a timeout of about 2/3 of
> the timeout; it it fails it shuts the VM down the hard way.

Reading the Xen RA, I see they added a shutdown timeout escalation parameter.
This is a reasonable solution, but isn't it possible to get the action timeout
directly? I looked for such information in the past with no success.

> 
> I don't know Postgres in detail, but I could imagine a three step approach:
> 1) Shutdown after current operations have finished
> 2) Shutdown regardless of pending operations (doing rollbacks)
> 3) Shutdown the hard way, requiring recovery on the next start (I think in
> Oracle this is called a "shutdown abort")

Exactly.

> Depending on the scenario one may start at step 2)

Indeed.

> [...]
> I think RAs should not rely on "stop" being called multiple times for a
> resource to be stopped.

Ok, so the RA should take care of their own escalation during a single action.

Thanks,