[ClusterLabs] cleanup of a resource leads to restart of Virtual Domains
Yan Gao
YGao at suse.com
Thu Sep 26 11:19:39 EDT 2019
Hi,
On 9/26/19 3:25 PM, Lentes, Bernd wrote:
> HI,
>
> i had two errors with a GSF2 Partition several days ago:
> gfs2_share_monitor_30000 on ha-idg-2 'unknown error' (1): call=103, status=Timed Out, exitreason='',
> last-rc-change='Thu Sep 19 13:44:22 2019', queued=0ms, exec=0ms
>
> gfs2_share_monitor_30000 on ha-idg-1 'unknown error' (1): call=103, status=Timed Out, exitreason='',
> last-rc-change='Thu Sep 19 13:44:12 2019', queued=0ms, exec=0ms
>
> Now i wanted to get rid of these messages and did a "resource cleanup".
> I had to do this several times until both dissapeared.
>
> But then all VirtualDomain resources restarted.
>
> The config for the GSF2 is:
> primitive gfs2_share Filesystem \
> params device="/dev/vg_san/lv_share" directory="/mnt/share" fstype=gfs2 options=acl \
> op monitor interval=30 timeout=20 \
> op start timeout=60 interval=0 \
> op stop timeout=60 interval=0 \
> meta is-managed=true
>
> /mnt/share keeps the config files for VirtualDomains.
>
> Here one VirtualDomain config (the others are the same):
> primitive vm_crispor VirtualDomain \
> params config="/mnt/share/crispor.xml" \
> params hypervisor="qemu:///system" \
> params migration_transport=ssh \
> params migrate_options="--p2p --tunnelled" \
> op start interval=0 timeout=120 \
> op stop interval=0 timeout=180 \
> op monitor interval=30 timeout=25 \
> op migrate_from interval=0 timeout=300 \
> op migrate_to interval=0 timeout=300 \
> meta allow-migrate=true target-role=Started is-managed=true maintenance=false \
> utilization cpu=2 hv_memory=8192
>
> The GFS2 Share is a group and the group is cloned:
> group gr_share dlm clvmd gfs2_share gfs2_snap fs_ocfs2
> clone cl_share gr_share \
> meta target-role=Started interleave=true
>
> And for each VirtualDomain i have an order:
> order or_vm_crispor_after_gfs2 Mandatory: cl_share vm_crispor symmetrical=true
>
> Why are the domains restarted ? I thought a cleanup would just delete the error message.
It could be potentially fixed by this:
https://github.com/ClusterLabs/pacemaker/pull/1765
Regards,
Yan
More information about the Users
mailing list