[ClusterLabs] Resources suddenly get target-role="stopped"
Boyan Ikonomov
boyan at euronas.com
Mon Dec 7 18:12:35 UTC 2015
Mistery solved:
Never put:
[ "${2}" = release ] && crm resource stop VMA_${1}
inside
/etc/libvirt/hooks/qemu
Very wrong decision.
On Monday 07 December 2015 16:49:01 emmanuel segura wrote:
> the next time show your full config unless your config have something
> special that you can't show.
>
> 2015-12-07 9:08 GMT+01:00 Klechomir <klecho at gmail.com>:
> > Hi,
> > Sorry didn't get your point.
> >
> > The xml of the VM is on a active-active drbd drive with ocfs2 fs on it and
> > is visible from both nodes.
> > The live migration is always successful.
> >
> > On 4.12.2015 19:30, emmanuel segura wrote:
> >> I think the xml of your vm need to available on both nodes, but your
> >> using a failover resource Filesystem_CDrive1, because pacemaker
> >> monitor resource on both nodes to check if they are running in
> >> multiple nodes.
> >>
> >> 2015-12-04 18:06 GMT+01:00 Ken Gaillot <kgaillot at redhat.com>:
> >>> On 12/04/2015 10:22 AM, Klechomir wrote:
> >>>> Hi list,
> >>>> My issue is the following:
> >>>>
> >>>> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
> >>>> (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3)
> >>>>
> >>>> Bumped on this issue when started playing with VirtualDomain resources,
> >>>> but this seems to be unrelated to the RA.
> >>>>
> >>>> The problem is that without apparent reason a resource gets
> >>>> target-role="Stopped". This happens after (successful) migration, or
> >>>> after failover., or after VM restart .
> >>>>
> >>>> My tests showed that changing the resource name fixes this problem, but
> >>>> this seems to be a temporary workaround.
> >>>>
> >>>> The resource configuration is:
> >>>> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
> >>>>
> >>>> params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
> >>>>
> >>>> hypervisor="qemu:///system" migration_transport="tcp" \
> >>>>
> >>>> meta allow-migrate="true" target-role="Started" \
> >>>> op start interval="0" timeout="120s" \
> >>>> op stop interval="0" timeout="120s" \
> >>>> op monitor interval="10" timeout="30" depth="0" \
> >>>> utilization cpu="1" hv_memory="925"
> >>>>
> >>>> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1
> >>>>
> >>>> Here is the log from one such stop, after successful migration with
> >>>> "crm
> >>>> migrate resource VMA_VM1":
> >>>>
> >>>> Dec 04 15:18:22 [3818929] CLUSTER-1 crmd: debug: cancel_op:
> >>>> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
> >>>> Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info:
> >>>> cancel_recurring_action: Cancelling operation
> >>>> VMA_VM1_monitor_10000
> >>>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd: debug: cancel_op:
> >>>> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
> >>>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd: debug:
> >>>> do_lrm_rsc_op: Performing
> >>>> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56
> >>>> op=VMA_VM1_migrate_to_0
> >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:23 DEBUG:
> >>>> Virtual domain VM1 is currently running.
> >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:23 INFO: VM1:
> >>>> Starting live migration to CLUSTER-2 (using virsh
> >>>> --connect=qemu:///system --quiet migrate --live VM1
> >>>> qemu+tcp://CLUSTER-2/system ).
> >>>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info:
> >>>> process_lrm_event: LRM operation VMA_VM1_monitor_10000 (call=5564,
> >>>> status=1, cib-update=0, confirmed=false) Cancelled
> >>>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: debug:
> >>>> update_history_cache: Updating history for 'VMA_VM1' with
> >>>> monitor op
> >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:26 INFO: VM1:
> >>>> live migration to CLUSTER-2 succeeded.
> >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: debug:
> >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with
> >>>> rc=0
> >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice:
> >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 [
> >>>> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
> >>>> (using virsh --connect=qemu:///system --quiet migrate --live VM1
> >>>> qemu+tcp://CLUSTER-2/system ). ]
> >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice:
> >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 [
> >>>> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
> >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: debug:
> >>>> create_operation_update: do_update_resource: Updating resouce
> >>>> VMA_VM1 after complete migrate_to op (interval=0)
> >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice:
> >>>> process_lrm_event: LRM operation VMA_VM1_migrate_to_0 (call=5697,
> >>>> rc=0, cib-update=89, confirmed=true) ok
> >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: debug:
> >>>> update_history_cache: Updating history for 'VMA_VM1' with
> >>>> migrate_to op
> >>>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd: debug: cancel_op:
> >>>> Operation VMA_VM1:5564 already cancelled
> >>>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd: debug:
> >>>> do_lrm_rsc_op: Performing
> >>>> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
> >>>> VirtualDomain(VMA_VM1)[1798719]: 2015/12/04_15:18:31 DEBUG:
> >>>> Virtual domain VM1 is not running: failed to get domain 'vm1' domain
> >>>> not found: no domain with matching name 'vm1'
> >>>
> >>> This looks like the problem. Configuration error?
> >>>
> >>>> VirtualDomain(VMA_VM1)[1798719]: 2015/12/04_15:18:31 INFO:
> >>>> Domain
> >>>> VM1 already stopped.
> >>>> Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: debug:
> >>>> operation_finished: VMA_VM1_stop_0:1798719 - exited with rc=0
> >>>> Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: notice:
> >>>> operation_finished: VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
> >>>> INFO: Domain VM1 already stopped. ]
> >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: debug:
> >>>> create_operation_update: do_update_resource: Updating resouce
> >>>> VMA_VM1 after complete stop op (interval=0)
> >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: notice:
> >>>> process_lrm_event: LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
> >>>> cib-update=90, confirmed=true) ok
> >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: debug:
> >>>> update_history_cache: Updating history for 'VMA_VM1' with stop
> >>>> op
> >>>> Dec 04 15:20:58 [3818929] CLUSTER-1 crmd: debug:
> >>>> create_operation_update: build_active_RAs: Updating resouce
> >>>> VMA_VM1
> >>>> after complete stop op (interval=0)
> >>>> Dec 04 15:20:58 [3818929] CLUSTER-1 crmd: debug:
> >>>> create_operation_update: build_active_RAs: Updating resouce
> >>>> VMA_VM1
> >>>> after complete monitor op (interval=0)
> >>>> Dec 04 15:23:31 [1833996] CLUSTER-1 crm_resource: debug:
> >>>> process_orphan_resource: Detected orphan resource VMA_VM1 on
> >>>> CLUSTER-2
> >>>>
> >>>> Any suggestions are welcome.
> >>>>
> >>>> Best regards,
> >>>> Klecho
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org
> >>> http://clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list