[ClusterLabs] Why "Stop" action isn't called during failover?

Mon Nov 20 17:44:00 EST 2017

On Fri, 2017-11-10 at 11:15 +0200, Klecho wrote:
> Hi List,
> 
> I have a VM, which is constraint dependant on its storage resource.
> 
> When the storage resource goes down, I'm observing the following:
> 
> (pacemaker 1.1.16 & corosync 2.4.2)
> 
> Nov 10 10:04:36 [1202] NODE-2    pengine:     info: LogActions:      
> Leave   vm_lomem1       (Started NODE-2)
> 
> Filesystem(p_AA_Filesystem_Drive16)[2097324]: 2017/11/10_10:04:37
> INFO: 
> sending signal TERM to: libvirt+ 1160142       1  0 09:01 ?        
> Sl     0:07 qemu-system-x86_64
> 
> 
> The VM (VirtualDomain RA) gets killed without calling "Stop" RA
> action.
> 
> Isn't the proper way to call "Stop" for all related resources in such
> cases?

Above, it's not Pacemaker that's killing the VM, it's the Filesystem
resource itself.

When the Filesystem agent gets a stop request, if it's unable the
unmount the filesystem, it can try further action according to its
force_unmount option: "This option allows specifying how to handle
processes that are currently accessing the mount directory ... Default
value, kill processes accessing mount point".

What does the configuration for the resources and constraints look
like? Based on what you described, Pacemaker shouldn't try to stop the
Filesystem resource before successfully stopping the VM first.
-- 
Ken Gaillot <kgaillot at redhat.com>