[ClusterLabs] Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Oct 26 03:34:24 EDT 2020
>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 23.10.2020 um 17:06 in
Nachricht <428616368.2019191.1603465603970 at mail.yahoo.com>:
> why don't you work with something like this: 'op stop interval =300
> timeout=600'.
I always thought "interval=" does not make any sense for "start" and "stop",
but only for "monitor"...
> The stop operation will timeout at your requirements without modifying the
> script.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В четвъртък, 22 октомври 2020 г., 23:30:08 Гринуич+3, Lentes, Bernd
> <bernd.lentes at helmholtz-muenchen.de> написа:
>
>
>
>
>
> Hi guys,
>
> ocassionally stopping a VirtualDomain resource via "crm resource stop" does
> not work, and in the end the node is fenced, which is ugly.
> I had a look at the RA to see what it does. After trying to stop the domain
> via "virsh shutdown ..." in a configurable time it switches to "virsh
> destroy".
> i assume "virsh destroy" send a sigkill to the respective process. But when
> the host is doing heavily IO it's possible that the process is in "D" state
> (uninterruptible sleep)
> in which it can't be finished with a SIGKILL. The the node the domain is
> running on is fenced due to that.
> I digged deeper and found out that the signal is often delivered a bit later
> (just some seconds) and the process is killed, but pacemaker already decided
> to fence the node.
> It's all about this excerp in the RA:
>
> force_stop()
> {
> local out ex translate
> local status=0
>
> ocf_log info "Issuing forced shutdown (destroy) request for domain
> ${DOMAIN_NAME}."
> out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)
> ex=$?
> translate=$(echo $out|tr 'A-Z' 'a-z')
> echo >&2 "$translate"
> case $ex$translate in
> *"error:"*"domain is not running"*|*"error:"*"domain not
> found"*|\
> *"error:"*"failed to get domain"*)
> : ;; # unexpected path to the intended outcome, all
> is well
> [!0]*)
> ocf_exit_reason "forced stop failed"
> return $OCF_ERR_GENERIC ;;
> 0*)
> while [ $status != $OCF_NOT_RUNNING ]; do
> VirtualDomain_status
> status=$?
> done ;;
> esac
> return $OCF_SUCCESS
> }
>
> I'm thinking about the following:
> How about to let the script wait a bit after "virsh destroy". I saw that
> usually it just takes some seconds that "virsh destroy" is successfull.
> I'm thinking about this change:
>
> ocf_log info "Issuing forced shutdown (destroy) request for domain
> ${DOMAIN_NAME}."
> out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)
> ex=$?
> sleep (10) <============================ (or maybe configurable)
> translate=$(echo $out|tr 'A-Z' 'a-z')
>
>
> What do you think ?
>
> Bernd
>
>
> --
>
> Bernd Lentes
> Systemadministration
> Institute for Metabolism and Cell Death (MCD)
> Building 25 - office 122
> HelmholtzZentrum München
> bernd.lentes at helmholtz-muenchen.de
> phone: +49 89 3187 1241
> phone: +49 89 3187 3827
> fax: +49 89 3187 2294
> http://www.helmholtz-muenchen.de/mcd
>
> stay healthy
> Helmholtz Zentrum München
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin
> Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list