[ClusterLabs] Antw: Re: Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri May 20 09:12:28 UTC 2016
>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 20.05.2016 um
09:59 in
Nachricht <20160520095934.029c1822 at firost>:
> Le Fri, 20 May 2016 08:39:42 +0200,
> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> a écrit :
>
>> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 19.05.2016 um
>> >>> 21:29 in
>> Nachricht <20160519212947.6cc0fd7b at firost>:
>> [...]
>> > I was thinking of a use case where a graceful demote or stop action
failed
>> > multiple times and to give a chance to the RA to choose another method to
>> > stop
>> > the resource before it requires a migration. As instance, PostgreSQL has
3
>> > different kind of stop, the last one being not graceful, but still better
>> > than
>> > a kill -9.
>>
>> For example the Xen RA tries a clean shutdown with a timeout of about 2/3
of
>> the timeout; it it fails it shuts the VM down the hard way.
>
> Reading the Xen RA, I see they added a shutdown timeout escalation
> parameter.
Not quite:
if [ -n "$OCF_RESKEY_shutdown_timeout" ]; then
timeout=$OCF_RESKEY_shutdown_timeout
elif [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
# Allow 2/3 of the action timeout for the orderly shutdown
# (The origin unit is ms, hence the conversion)
timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
else
timeout=60
fi
> This is a reasonable solution, but isn't it possible to get the action
> timeout
> directly? I looked for such information in the past with no success.
See above.
>
>>
>> I don't know Postgres in detail, but I could imagine a three step
approach:
>> 1) Shutdown after current operations have finished
>> 2) Shutdown regardless of pending operations (doing rollbacks)
>> 3) Shutdown the hard way, requiring recovery on the next start (I think in
>> Oracle this is called a "shutdown abort")
>
> Exactly.
>
>> Depending on the scenario one may start at step 2)
>
> Indeed.
>
>> [...]
>> I think RAs should not rely on "stop" being called multiple times for a
>> resource to be stopped.
>
> Ok, so the RA should take care of their own escalation during a single
> action.
>
> Thanks,
Regards,
Ulrich
More information about the Users
mailing list