[ClusterLabs] VirtualDomain restart caused fencing.
kgaillot at redhat.com
kgaillot at redhat.com
Wed Jun 30 12:11:25 EDT 2021
On Wed, 2021-06-30 at 08:40 -0700, Matthew Schumacher wrote:
> Hello,
>
> I'm not sure how to fix this, but calling 'crm resource restart vm-
> name' this morning caused an entire node to get fenced, kicking the
> stool out from under a number of VMs.
>
> Looking at VirtualDomain it looks like the system defaults to a 90s
> timeout, and if it can't gracefully shutdown the VM with 'virsh
> shutdown' in 85s, then it calls 'virsh destroy'. For whatever
> reason, that's not what happened.
That would be the mystery to solve. It sounds like the node was fenced
because the stop failed, so that would be where to investigate.
> I created a mockup where I moved a test vm to it's own node (in case
> it gets fenced), then loaded something that would ignore acpi
> shutdown, then called restart. This time it worked. The logs show:
>
> Jun 30 15:32:11 VirtualDomain(vm-testvm)[13047]: INFO: Issuing
> graceful shutdown request for domain testvm.
> Jun 30 15:32:26 VirtualDomain(vm-testvm)[13047]: INFO: Issuing
> forced shutdown (destroy) request for domain testvm.
>
> I don't have the logs from the original failure due to my node not
> being persistent, but I wonder if anyone else has run into this.
>
> Here is my resource configuration if that reveals the issue:
>
> crm configure primitive vm-testvm2 VirtualDomain params
> config="/datastore/vm/testvm/testvm.xml" migration_transport=ssh meta
> allow-migrate=true target-role=Started op monitor timeout=30
> interval=30
>
> Oh, one last question: Can I disable fencing for a specific resource
> for testing reasons? I'd love to watch this break without fear of
> fencing.
Yes, for this scenario, configuring on-fail=block for the stop
operation would cause the cluster to leave the VM alone if the stop
fails. (The VM would not be recovered elsewhere.)
> Matt
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list