[ClusterLabs] Antw: [EXT] VirtualDomain restart caused fencing.

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 1 02:19:08 EDT 2021


>>> Matthew Schumacher <matt.s at aptalaska.net> schrieb am 30.06.2021 um 17:40 in
Nachricht <50a4af9f-23c3-5913-1ffa-6503792da20a at aptalaska.net>:
> Hello,
> 
> I'm not sure how to fix this, but calling 'crm resource restart vm-name' 
> this morning caused an entire node to get fenced, kicking the stool out from 
> under a number of VMs.
> 
> Looking at VirtualDomain it looks like the system defaults to a 90s timeout, 
> and if it can't gracefully shutdown the VM with 'virsh shutdown' in 85s, then 
> it calls 'virsh destroy'.  For whatever reason, that's not what happened.
> 
> I created a mockup where I moved a test vm to it's own node (in case it gets 
> fenced), then loaded something that would ignore acpi shutdown, then called 
> restart.  This time it worked.  The logs show:
> 
> Jun 30 15:32:11  VirtualDomain(vm-testvm)[13047]:    INFO: Issuing graceful 
> shutdown request for domain testvm.
> Jun 30 15:32:26  VirtualDomain(vm-testvm)[13047]:    INFO: Issuing forced 
> shutdown (destroy) request for domain testvm.
> 
> I don't have the logs from the original failure due to my node not being 
> persistent, but I wonder if anyone else has run into this.

Typically it's a stop timeout. You should capture the logs!
In general coleccting the syslog of all cluster nodes to one node outside of the cluster may be a valuable help for debugging cluster problems (while admitting we still don't have that, but I'm working on it ;-)).

> 
> Here is my resource configuration if that reveals the issue:
> 
> crm configure primitive vm-testvm2 VirtualDomain params 
> config="/datastore/vm/testvm/testvm.xml" migration_transport=ssh meta 
> allow-migrate=true target-role=Started op monitor timeout=30 interval=30

A timeout on /datastore might trigger that as well.

> 
> Oh, one last question:  Can I disable fencing for a specific resource for 
> testing reasons?  I'd love to watch this break without fear of fencing.

onfail=ignore

> 
> Matt






More information about the Users mailing list