[ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

Fri May 20 08:39:42 CEST 2016

>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 19.05.2016 um 21:29 in
Nachricht <20160519212947.6cc0fd7b at firost>:
[...]
> I was thinking of a use case where a graceful demote or stop action failed
> multiple times and to give a chance to the RA to choose another method to 
> stop
> the resource before it requires a migration. As instance, PostgreSQL has 3
> different kind of stop, the last one being not graceful, but still better 
> than
> a kill -9.

For example the Xen RA tries a clean shutdown with a timeout of about 2/3 of the timeout; it it fails it shuts the VM down the hard way.

I don't know Postgres in detail, but I could imagine a three step approach:
1) Shutdown after current operations have finished
2) Shutdown regardless of pending operations (doing rollbacks)
3) Shutdown the hard way, requiring recovery on the next start (I think in Oracle this is called a "shutdown abort")

Depending on the scenario one may start at step 2)

[...]
I think RAs should not rely on "stop" being called multiple times for a resource to be stopped.

Regards,
Ulrich