[ClusterLabs] Resources suddenly get target-role="stopped"

Mon Dec 7 15:49:01 UTC 2015

the next time show your full config unless your config have something
special that you can't show.

2015-12-07 9:08 GMT+01:00 Klechomir <klecho at gmail.com>:
> Hi,
> Sorry didn't get your point.
>
> The xml of the VM is on a active-active drbd drive with ocfs2 fs on it and
> is visible from both nodes.
> The live migration is always successful.
>
>
>
> On 4.12.2015 19:30, emmanuel segura wrote:
>>
>> I think the xml of your vm need to available on both nodes, but your
>> using a failover resource Filesystem_CDrive1, because pacemaker
>> monitor resource on both nodes to check if they are running in
>> multiple nodes.
>>
>> 2015-12-04 18:06 GMT+01:00 Ken Gaillot <kgaillot at redhat.com>:
>>>
>>> On 12/04/2015 10:22 AM, Klechomir wrote:
>>>>
>>>> Hi list,
>>>> My issue is the following:
>>>>
>>>> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
>>>> (observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)
>>>>
>>>> Bumped on this issue when started playing with VirtualDomain resources,
>>>> but this seems to be unrelated to the RA.
>>>>
>>>> The problem is that without apparent reason a resource gets
>>>> target-role="Stopped". This happens after (successful) migration, or
>>>> after failover., or after VM restart .
>>>>
>>>> My tests showed that changing the resource name fixes this problem, but
>>>> this seems to be a temporary workaround.
>>>>
>>>> The resource configuration is:
>>>> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
>>>>          params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
>>>> hypervisor="qemu:///system" migration_transport="tcp" \
>>>>          meta allow-migrate="true" target-role="Started" \
>>>>          op start interval="0" timeout="120s" \
>>>>          op stop interval="0" timeout="120s" \
>>>>          op monitor interval="10" timeout="30" depth="0" \
>>>>          utilization cpu="1" hv_memory="925"
>>>> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1
>>>>
>>>> Here is the log from one such stop, after successful migration with "crm
>>>> migrate resource VMA_VM1":
>>>>
>>>> Dec 04 15:18:22 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>>>> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
>>>> Dec 04 15:18:22 [4434] CLUSTER-1       lrmd:     info:
>>>> cancel_recurring_action:         Cancelling operation
>>>> VMA_VM1_monitor_10000
>>>> Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>>>> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
>>>> Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug:
>>>> do_lrm_rsc_op:        Performing
>>>> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56
>>>> op=VMA_VM1_migrate_to_0
>>>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 DEBUG:
>>>> Virtual domain VM1 is currently running.
>>>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 INFO: VM1:
>>>> Starting live migration to CLUSTER-2 (using virsh
>>>> --connect=qemu:///system --quiet migrate --live  VM1
>>>> qemu+tcp://CLUSTER-2/system ).
>>>> Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:     info:
>>>> process_lrm_event:    LRM operation VMA_VM1_monitor_10000 (call=5564,
>>>> status=1, cib-update=0, confirmed=false) Cancelled
>>>> Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:    debug:
>>>> update_history_cache:         Updating history for 'VMA_VM1' with
>>>> monitor op
>>>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:26 INFO: VM1:
>>>> live migration to CLUSTER-2 succeeded.
>>>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:    debug:
>>>> operation_finished:      VMA_VM1_migrate_to_0:1797698 - exited with rc=0
>>>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
>>>> operation_finished:      VMA_VM1_migrate_to_0:1797698 [
>>>> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
>>>> (using virsh --connect=qemu:///system --quiet migrate --live  VM1
>>>> qemu+tcp://CLUSTER-2/system ). ]
>>>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
>>>> operation_finished:      VMA_VM1_migrate_to_0:1797698 [
>>>> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
>>>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
>>>> create_operation_update:      do_update_resource: Updating resouce
>>>> VMA_VM1 after complete migrate_to op (interval=0)
>>>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:   notice:
>>>> process_lrm_event:    LRM operation VMA_VM1_migrate_to_0 (call=5697,
>>>> rc=0, cib-update=89, confirmed=true) ok
>>>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
>>>> update_history_cache:         Updating history for 'VMA_VM1' with
>>>> migrate_to op
>>>> Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>>>> Operation VMA_VM1:5564 already cancelled
>>>> Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug:
>>>> do_lrm_rsc_op:        Performing
>>>> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
>>>> VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 DEBUG:
>>>> Virtual domain VM1 is not running:  failed to get domain 'vm1' domain
>>>> not found: no domain with matching name 'vm1'
>>>
>>> This looks like the problem. Configuration error?
>>>
>>>> VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 INFO: Domain
>>>> VM1 already stopped.
>>>> Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:    debug:
>>>> operation_finished:      VMA_VM1_stop_0:1798719 - exited with rc=0
>>>> Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:   notice:
>>>> operation_finished:      VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
>>>> INFO: Domain VM1 already stopped. ]
>>>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
>>>> create_operation_update:      do_update_resource: Updating resouce
>>>> VMA_VM1 after complete stop op (interval=0)
>>>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:   notice:
>>>> process_lrm_event:    LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
>>>> cib-update=90, confirmed=true) ok
>>>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
>>>> update_history_cache:         Updating history for 'VMA_VM1' with stop
>>>> op
>>>> Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
>>>> create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
>>>> after complete stop op (interval=0)
>>>> Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
>>>> create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
>>>> after complete monitor op (interval=0)
>>>> Dec 04 15:23:31 [1833996] CLUSTER-1 crm_resource:    debug:
>>>> process_orphan_resource:    Detected orphan resource VMA_VM1 on
>>>> CLUSTER-2
>>>>
>>>> Any suggestions are welcome.
>>>>
>>>> Best regards,
>>>> Klecho
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^