[ClusterLabs] Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Oct 27 03:27:33 EDT 2020


>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 26.10.2020 um 17:54 in
Nachricht <293971081.2615607.1603731258371 at mail.yahoo.com>:
> I think it's useful - for example a HANA powers up for 10-15min (even more ,

> depends on storage tier) - so the default will time out and the fun starts 
> there.

Hi!

VMs are a classical case where "one size fits all" doesn't work: For migration
we need customized timeouts that depend on the size of VM RAM and the ratio of
dirty pages (writing databases with big buffers are bad candadates for
live-migration, for example). OTOH you don't want your timeouts to be longer
than necessary in case something goes wrong. Well, you can never cover 100%,
but 95-99% is rather good.

Regards,
Ulrich


> 
> Maybe the cluster is just showing them without using them , but it looked 
> quite the opposite.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 26 октомври 2020 г., 09:34:31 Гринуич+2, Ulrich Windl 
> <ulrich.windl at rz.uni-regensburg.de> написа: 
> 
> 
> 
> 
> 
>>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 23.10.2020 um 17:06
in
> Nachricht <428616368.2019191.1603465603970 at mail.yahoo.com>:
>> why don't you work with something like this: 'op stop interval =300 
>> timeout=600'.
> 
> I always thought "interval=" does not make any sense for "start" and
"stop",
> but only for "monitor"...
> 
>> The stop operation will timeout at your requirements without modifying the

>> script.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В четвъртък, 22 октомври 2020 г., 23:30:08 Гринуич+3, Lentes, Bernd 
>> <bernd.lentes at helmholtz-muenchen.de> написа: 
>> 
>> 
>> 
>> 
>> 
>> Hi guys,
>> 
>> ocassionally stopping a VirtualDomain resource via "crm resource stop"
does
> 
>> not work, and in the end the node is fenced, which is ugly.
>> I had a look at the RA to see what it does. After trying to stop the
domain
> 
>> via "virsh shutdown ..." in a configurable time it switches to "virsh 
>> destroy".
>> i assume "virsh destroy" send a sigkill to the respective process. But
when
> 
>> the host is doing heavily IO it's possible that the process is in "D"
state
> 
>> (uninterruptible sleep) 
>> in which it can't be finished with a SIGKILL. The the node the domain is 
>> running on is fenced due to that.
>> I digged deeper and found out that the signal is often delivered a bit
later
> 
>> (just some seconds) and the process is killed, but pacemaker already
decided
> 
>> to fence the node.
>> It's all about this excerp in the RA:
>> 
>> force_stop()
>> {
>>        local out ex translate
>>        local status=0
>> 
>>        ocf_log info "Issuing forced shutdown (destroy) request for domain 
>> ${DOMAIN_NAME}."
>>        out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)
>>        ex=$?
>>        translate=$(echo $out|tr 'A-Z' 'a-z')
>>        echo >&2 "$translate"
>>        case $ex$translate in
>>                *"error:"*"domain is not running"*|*"error:"*"domain not 
>> found"*|\
>>                *"error:"*"failed to get domain"*)
>>                        : ;; # unexpected path to the intended outcome, all
> 
>> is well
>>                [!0]*)
>>                        ocf_exit_reason "forced stop failed"
>>                        return $OCF_ERR_GENERIC ;;
>>                0*)
>>                        while [ $status != $OCF_NOT_RUNNING ]; do
>>                                VirtualDomain_status
>>                               status=$?
>>                        done ;;
>>        esac
>>        return $OCF_SUCCESS
>> }
>> 
>> I'm thinking about the following:
>> How about to let the script wait a bit after "virsh destroy". I saw that 
>> usually it just takes some seconds that "virsh destroy" is successfull.
>> I'm thinking about this change:
>> 
>> ocf_log info "Issuing forced shutdown (destroy) request for domain 
>> ${DOMAIN_NAME}."
>>        out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)
>>        ex=$?
>>        sleep (10)    <============================ (or maybe configurable)
>>        translate=$(echo $out|tr 'A-Z' 'a-z')
>> 
>> 
>> What do you think ?
>> 
>> Bernd
>> 
>> 
>> -- 
>> 
>> Bernd Lentes 
>> Systemadministration 
>> Institute for Metabolism and Cell Death (MCD) 
>> Building 25 - office 122 
>> HelmholtzZentrum München 
>> bernd.lentes at helmholtz-muenchen.de 
>> phone: +49 89 3187 1241 
>> phone: +49 89 3187 3827 
>> fax: +49 89 3187 2294 
>> http://www.helmholtz-muenchen.de/mcd 
>> 
>> stay healthy
>> Helmholtz Zentrum München
>> 
>> Helmholtz Zentrum Muenchen
>> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
>> Ingolstaedter Landstr. 1
>> 85764 Neuherberg
>> www.helmholtz-muenchen.de 
>> Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
>> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin 
>> Guenther
>> Registergericht: Amtsgericht Muenchen HRB 6466
>> USt-IdNr: DE 129521671
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list