[ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

Thu Sep 26 11:19:39 EDT 2019

Hi,

On 9/26/19 3:25 PM,  Lentes, Bernd  wrote:
> HI,
> 
> i had two errors with a GSF2 Partition several days ago:
> gfs2_share_monitor_30000 on ha-idg-2 'unknown error' (1): call=103, status=Timed Out, exitreason='',
>      last-rc-change='Thu Sep 19 13:44:22 2019', queued=0ms, exec=0ms
> 
> gfs2_share_monitor_30000 on ha-idg-1 'unknown error' (1): call=103, status=Timed Out, exitreason='',
>      last-rc-change='Thu Sep 19 13:44:12 2019', queued=0ms, exec=0ms
> 
> Now i wanted to get rid of these messages and did a "resource cleanup".
> I had to do this several times until both dissapeared.
> 
> But then all VirtualDomain resources restarted.
> 
> The config for the GSF2 is:
> primitive gfs2_share Filesystem \
>          params device="/dev/vg_san/lv_share" directory="/mnt/share" fstype=gfs2 options=acl \
>          op monitor interval=30 timeout=20 \
>          op start timeout=60 interval=0 \
>          op stop timeout=60 interval=0 \
>          meta is-managed=true
> 
> /mnt/share keeps the config files for VirtualDomains.
> 
> Here one VirtualDomain config (the others are the same):
> primitive vm_crispor VirtualDomain \
>          params config="/mnt/share/crispor.xml" \
>          params hypervisor="qemu:///system" \
>          params migration_transport=ssh \
>          params migrate_options="--p2p --tunnelled" \
>          op start interval=0 timeout=120 \
>          op stop interval=0 timeout=180 \
>          op monitor interval=30 timeout=25 \
>          op migrate_from interval=0 timeout=300 \
>          op migrate_to interval=0 timeout=300 \
>          meta allow-migrate=true target-role=Started is-managed=true maintenance=false \
>          utilization cpu=2 hv_memory=8192
> 
> The GFS2 Share is a group and the group is cloned:
> group gr_share dlm clvmd gfs2_share gfs2_snap fs_ocfs2
> clone cl_share gr_share \
>          meta target-role=Started interleave=true
> 
> And for each VirtualDomain i have an order:
> order or_vm_crispor_after_gfs2 Mandatory: cl_share vm_crispor symmetrical=true
> 
> Why are the domains restarted ? I thought a cleanup would just delete the error message.
It could be potentially fixed by this:
https://github.com/ClusterLabs/pacemaker/pull/1765

Regards,
   Yan