[ClusterLabs] Resources suddenly get target-role="stopped"

Mon Dec 7 08:03:05 UTC 2015

Hi Ken,
The comments are in the text.

On 4.12.2015 19:06, Ken Gaillot wrote:
> On 12/04/2015 10:22 AM, Klechomir wrote:
>> Hi list,
>> My issue is the following:
>>
>> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
>> (observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)
>>
>> Bumped on this issue when started playing with VirtualDomain resources,
>> but this seems to be unrelated to the RA.
>>
>> The problem is that without apparent reason a resource gets
>> target-role="Stopped". This happens after (successful) migration, or
>> after failover., or after VM restart .
>>
>> My tests showed that changing the resource name fixes this problem, but
>> this seems to be a temporary workaround.
>>
>> The resource configuration is:
>> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
>>          params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
>> hypervisor="qemu:///system" migration_transport="tcp" \
>>          meta allow-migrate="true" target-role="Started" \
>>          op start interval="0" timeout="120s" \
>>          op stop interval="0" timeout="120s" \
>>          op monitor interval="10" timeout="30" depth="0" \
>>          utilization cpu="1" hv_memory="925"
>> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1
>>
>> Here is the log from one such stop, after successful migration with "crm
>> migrate resource VMA_VM1":
>>
>> Dec 04 15:18:22 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
>> Dec 04 15:18:22 [4434] CLUSTER-1       lrmd:     info:
>> cancel_recurring_action:         Cancelling operation VMA_VM1_monitor_10000
>> Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
>> Dec 04 15:18:23 [3818929] CLUSTER-1       crmd:    debug:
>> do_lrm_rsc_op:        Performing
>> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0
>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 DEBUG:
>> Virtual domain VM1 is currently running.
>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:23 INFO: VM1:
>> Starting live migration to CLUSTER-2 (using virsh
>> --connect=qemu:///system --quiet migrate --live  VM1
>> qemu+tcp://CLUSTER-2/system ).
>> Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:     info:
>> process_lrm_event:    LRM operation VMA_VM1_monitor_10000 (call=5564,
>> status=1, cib-update=0, confirmed=false) Cancelled
>> Dec 04 15:18:24 [3818929] CLUSTER-1       crmd:    debug:
>> update_history_cache:         Updating history for 'VMA_VM1' with
>> monitor op
>> VirtualDomain(VMA_VM1)[1797698]:        2015/12/04_15:18:26 INFO: VM1:
>> live migration to CLUSTER-2 succeeded.
>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:    debug:
>> operation_finished:      VMA_VM1_migrate_to_0:1797698 - exited with rc=0
>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
>> operation_finished:      VMA_VM1_migrate_to_0:1797698 [
>> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
>> (using virsh --connect=qemu:///system --quiet migrate --live  VM1
>> qemu+tcp://CLUSTER-2/system ). ]
>> Dec 04 15:18:26 [4434] CLUSTER-1       lrmd:   notice:
>> operation_finished:      VMA_VM1_migrate_to_0:1797698 [
>> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
>> create_operation_update:      do_update_resource: Updating resouce
>> VMA_VM1 after complete migrate_to op (interval=0)
>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:   notice:
>> process_lrm_event:    LRM operation VMA_VM1_migrate_to_0 (call=5697,
>> rc=0, cib-update=89, confirmed=true) ok
>> Dec 04 15:18:27 [3818929] CLUSTER-1       crmd:    debug:
>> update_history_cache:         Updating history for 'VMA_VM1' with
>> migrate_to op
>> Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug: cancel_op:
>> Operation VMA_VM1:5564 already cancelled
>> Dec 04 15:18:31 [3818929] CLUSTER-1       crmd:    debug:
>> do_lrm_rsc_op:        Performing
>> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
>> VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 DEBUG:
>> Virtual domain VM1 is not running:  failed to get domain 'vm1' domain
>> not found: no domain with matching name 'vm1'
> This looks like the problem. Configuration error?

As far as I checked this is a harmless bug in VirtualDomain RA. It 
downcases the output from "virsh domain info" command, to be able to 
parse the status easily, which prevents matching the domain name.
In any case this error doesn't affect the RA functionality, in this case 
it just finds out that the resource is already stopped, while my big 
concern is why is it stopped.

>> VirtualDomain(VMA_VM1)[1798719]:        2015/12/04_15:18:31 INFO: Domain
>> VM1 already stopped.
>> Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:    debug:
>> operation_finished:      VMA_VM1_stop_0:1798719 - exited with rc=0
>> Dec 04 15:18:31 [4434] CLUSTER-1       lrmd:   notice:
>> operation_finished:      VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
>> INFO: Domain VM1 already stopped. ]
>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
>> create_operation_update:      do_update_resource: Updating resouce
>> VMA_VM1 after complete stop op (interval=0)
>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:   notice:
>> process_lrm_event:    LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
>> cib-update=90, confirmed=true) ok
>> Dec 04 15:18:32 [3818929] CLUSTER-1       crmd:    debug:
>> update_history_cache:         Updating history for 'VMA_VM1' with stop op
>> Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
>> create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
>> after complete stop op (interval=0)
>> Dec 04 15:20:58 [3818929] CLUSTER-1       crmd:    debug:
>> create_operation_update:      build_active_RAs: Updating resouce VMA_VM1
>> after complete monitor op (interval=0)
>> Dec 04 15:23:31 [1833996] CLUSTER-1 crm_resource:    debug:
>> process_orphan_resource:    Detected orphan resource VMA_VM1 on CLUSTER-2
>>
>> Any suggestions are welcome.
>>
>> Best regards,
>> Klecho
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org