[ClusterLabs] Antw: live migration rarely fails seemingly without reason

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Dec 4 07:17:06 UTC 2018


Hi!

It seems your systems run with non-operative fencing, and the cluster wants to
fence a node. Maybe bring the cluster to a clean state first, then repeat the
test.

Regards,
Ulrich

>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 03.12.2018
um
16:40 in Nachricht
<1271670904.2579771.1543851612288.JavaMail.zimbra at helmholtz-muenchen.de>:
> Hi,
> 
> i have a two node cluster with several VirtualDomains as resources. Normally

> live migration is no problem. But rarely it fails, without giving any 
> reasonable 
> message in the logs. I tried to migrate several VirtualDmains concurrently 
> from ha-idg-2 to ha-idg-1. One VirtualDomain failed, the others suceeded.
> 
> This happened already once before, i *think* it was also when i tried to 
> migrate several VirtualDomains concurrently, but i'm not complete sure (it's

> already
> some time ago).
> 
> But the error appears rarely, it often suceeded without any problems. I 
> *think* it's related to migrating several domains concurrently.
> 
> node 1:
> 2018-12-03T16:03:03.037872+01:00 ha-idg-1 crmd[8615]:  warning: Action 44 
> (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 2018-12-03T16:03:03.038234+01:00 ha-idg-1 crmd[8615]:   notice: Transition 
> 44 aborted by operation vm_mausdb_migrate_to_0 'modify' on ha-idg-2: Event 
> failed
> 2018-12-03T16:03:03.038569+01:00 ha-idg-1 crmd[8615]:  warning: Action 44 
> (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 
> 2018-12-03T16:03:03.041771+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> fence_ha-idg-2 has 'action' parameter, which should never be specified in 
> configuration
> 2018-12-03T16:03:03.042112+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:03:03.042365+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:03.208895+01:00 ha-idg-1 sshd[6397]: Received disconnect 
> from 146.107.235.132 port 46004:11: disconnected by user
> 2018-12-03T16:03:03.209367+01:00 ha-idg-1 sshd[6397]: Disconnected from 
> 146.107.235.132 port 46004
> 2018-12-03T16:03:03.209670+01:00 ha-idg-1 sshd[6397]: 
> pam_unix(sshd:session): session closed for user root
> 2018-12-03T16:03:03.213996+01:00 ha-idg-1 systemd-logind[1627]: Removed 
> session 17.
> 2018-12-03T16:03:03.286986+01:00 ha-idg-1 crmd[8615]:   notice: Transition 
> 44 (Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=9, 
> Source=/var/lib/pacemaker/pengine/pe-input-346.bz2): Stopped
> 2018-12-03T16:03:03.290762+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> fence_ha-idg-2 has 'action' parameter, which should never be specified in 
> configuration
> 2018-12-03T16:03:03.291224+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:03:03.291619+01:00 ha-idg-1 stonith-ng[8611]:  warning: 
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:03.313864+01:00 ha-idg-1 pengine[8614]:  warning: 
> Processing failed op migrate_to for vm_mausdb on ha-idg-2: unknown error
(1)
> 2018-12-03T16:03:03.314193+01:00 ha-idg-1 pengine[8614]:  warning: 
> Processing failed op migrate_to for vm_mausdb on ha-idg-2: unknown error
(1)
> 2018-12-03T16:03:03.316485+01:00 ha-idg-1 pengine[8614]:   notice: Recover 
> vm_mausdb#011(Started ha-idg-2 -> ha-idg-1)
> 2018-12-03T16:03:03.316805+01:00 ha-idg-1 pengine[8614]:   notice: Migrate 
> vm_geneious#011(Started ha-idg-2 -> ha-idg-1)
> 2018-12-03T16:03:03.317914+01:00 ha-idg-1 pengine[8614]:   notice: 
> Calculated transition 45, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-347.bz2
> 
> node2:
> 2018-12-03T16:02:32.312598+01:00 ha-idg-2 VirtualDomain(vm_mausdb)[7988]: 
> INFO: mausdb: Starting live migration to ha-idg-1 (using: virsh 
> --connect=qemu:///system --quiet migrate --live  ma
> usdb qemu+ssh://ha-idg-1/system ).
> 2018-12-03T16:02:32.315358+01:00 ha-idg-2 VirtualDomain(vm_geneious)[7989]:

> INFO: geneious: Starting live migration to ha-idg-1 (using: virsh 
> --connect=qemu:///system --quiet migrate --live
>   geneious qemu+ssh://ha-idg-1/system ).
> 2018-12-03T16:02:32.391911+01:00 ha-idg-2 lrmd[14256]:   notice: executing -

> rsc:vm_sim action:stop call_id:78
> 2018-12-03T16:02:32.394065+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> fence_ha-idg-1 has 'action' parameter, which should never be specified in 
> configuration
> 2018-12-03T16:02:32.394369+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.394619+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:02:32.459622+01:00 ha-idg-2 VirtualDomain(vm_sim)[8599]: INFO:

> Domain sim already stopped.
> 2018-12-03T16:02:32.464042+01:00 ha-idg-2 lrmd[14256]:   notice: finished -

> rsc:vm_sim action:stop call_id:78 pid:8599 exit-code:0 exec-time:72ms 
> queue-time:0ms
> 2018-12-03T16:02:32.464750+01:00 ha-idg-2 crmd[14259]:   notice: Result of 
> stop operation for vm_sim on ha-idg-2: 0 (ok)
> 2018-12-03T16:02:32.470496+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> fence_ha-idg-1 has 'action' parameter, which should never be specified in 
> configuration
> 2018-12-03T16:02:32.470794+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.471043+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:02:32.628562+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> fence_ha-idg-1 has 'action' parameter, which should never be specified in 
> configuration
> 2018-12-03T16:02:32.628970+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.629226+01:00 ha-idg-2 stonith-ng[14255]:  warning: 
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03 
> 15:03:02.835+0000: 4515: error : qemuMigrationCheckJobStatus:1456 :
operation 
> failed: migration job: unexpectedly failed
> 2018-12-03T16:03:03.006918+01:00 ha-idg-2 VirtualDomain(vm_mausdb)[7988]: 
> ERROR: mausdb: live migration to ha-idg-1 failed: 1
> 2018-12-03T16:03:03.032370+01:00 ha-idg-2 kernel: [ 6593.265436] br0: port 
> 2(vnet0) entered disabled state
> 2018-12-03T16:03:03.032383+01:00 ha-idg-2 kernel: [ 6593.266268] device 
> vnet0 left promiscuous mode
> 2018-12-03T16:03:03.032385+01:00 ha-idg-2 kernel: [ 6593.266275] br0: port 
> 2(vnet0) entered disabled state
> 2018-12-03T16:03:03.032789+01:00 ha-idg-2 lrmd[14256]:   notice: 
> vm_mausdb_migrate_to_0:7988:stderr [ error: operation failed: migration job:

> unexpectedly failed ]
> 2018-12-03T16:03:03.033126+01:00 ha-idg-2 lrmd[14256]:   notice: 
> vm_mausdb_migrate_to_0:7988:stderr [ ocf-exit-reason:mausdb: live migration

> to ha-idg-1 failed: 1 ]
> 2018-12-03T16:03:03.033376+01:00 ha-idg-2 lrmd[14256]:   notice: finished -

> rsc:vm_mausdb action:migrate_to call_id:75 pid:7988 exit-code:1 
> exec-time:31065ms queue-time:0ms
> 2018-12-03T16:03:03.033909+01:00 ha-idg-2 crmd[14259]:   notice: Result of 
> migrate_to operation for vm_mausdb on ha-idg-2: 1 (unknown error)
> 2018-12-03T16:03:03.034198+01:00 ha-idg-2 crmd[14259]:   notice: 
> ha-idg-2-vm_mausdb_migrate_to_0:75 [ error: operation failed: migration job:

> unexpectedly failed\nocf-exit-reason:mausdb: live migration to ha-idg-1 
> failed: 1\n ]
> 
> /var/log/pacemaker.log:
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng:     info: cib_devices_update:   

>    Updating devices to version 1.789.6
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng:  warning: xml2device_params:    

>    fence_ha-idg-2 has 'action' parameter, which should never be specified in

> configuration
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng:  warning: map_action:       
> Mapping action='Off' to pcmk_off_action='Off'
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng:  warning: map_action:       
> Mapping action='Off' to pcmk_reboot_action='Off'
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng:     info: cib_device_update:    

>    Device fence_ha-idg-1 has been disabled on ha-idg-1: score=-INFINITY
> Dec 03 16:02:37 [8610] ha-idg-1        cib:     info: cib_process_ping: 
> Reporting our current digest to ha-idg-1: 2715b3a8fd056fbd4d84cb12b911a328 
> for 1.789.6 (0x1aebf00 0)
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_perform_op:   
> Diff: --- 1.789.6 2
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_perform_op:   
> Diff: +++ 1.789.7 (null)
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_perform_op:   +  
> /cib:  @num_updates=7
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_perform_op:   +  
>
/cib/status/node_state[@id='1084777492']/lrm[@id='1084777492']/lrm_resources/
> lrm_resource[@id='vm_mausdb']/lrm_rsc
> _op[@id='vm_mausdb_last_0']:  
> @transition-magic=0:1;44:44:0:e4b76be7-fdb6-47c1-a905-9a49650b4180, 
> @exit-reason=mausdb: live migration to ha-idg-1 failed: 1, @call-id=75, 
> @rc-code=1, @op-sta
> tus=0, @exec-time=31065
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_perform_op:   ++ 
>
/cib/status/node_state[@id='1084777492']/lrm[@id='1084777492']/lrm_resources/
> lrm_resource[@id='vm_mausdb']:  <lrm_
> rsc_op id="vm_mausdb_last_failure_0" operation_key="vm_mausdb_migrate_to_0"

> operation="migrate_to" crm-debug-origin="do_update_resource" 
> crm_feature_set="3.0.13" transition-key="44:44:0:e4b
> 76be7-fdb6-47c1-a905-9a49650b4180" 
> transition-magic="0:1;44:44:0:e4b76be7-fdb6-47c1-a905-9a49650b4180" 
> exit-reason="mausdb: live migrat
> Dec 03 16:03:03 [8610] ha-idg-1        cib:     info: cib_process_request:  

>    Completed cib_modify operation for section status: OK (rc=0, 
> origin=ha-idg-2/crmd/74, version=1.789.7)
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:  warning: status_from_rc:   
> Action 44 (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1):

> Error
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:   notice: 
> abort_transition_graph:   Transition 44 aborted by operation 
> vm_mausdb_migrate_to_0 'modify' on ha-idg-2: Event failed | magic=0:1;44:4
> 4:0:e4b76be7-fdb6-47c1-a905-9a49650b4180 cib=1.789.7 
> source=match_graph_event:310 complete=false
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:     info: match_graph_event:    

>    Action vm_mausdb_migrate_to_0 (44) confirmed on ha-idg-2 (rc=1)
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:     info: process_graph_event:  

>    Detected action (44.44) vm_mausdb_migrate_to_0.75=unknown error: failed
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:  warning: status_from_rc:   
> Action 44 (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1):

> Error
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:     info: 
> abort_transition_graph:   Transition 44 aborted by operation 
> vm_mausdb_migrate_to_0 'create' on ha-idg-2: Event failed | magic=0:1;44:4
> 4:0:e4b76be7-fdb6-47c1-a905-9a49650b4180 cib=1.789.7 
> source=match_graph_event:310 complete=false
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:     info: match_graph_event:    

>    Action vm_mausdb_migrate_to_0 (44) confirmed on ha-idg-2 (rc=1)
> Dec 03 16:03:03 [8615] ha-idg-1       crmd:     info: process_graph_event:  

>    Detected action (44.44) vm_mausdb_migrate_to_0.75=unknown error: failed
> Dec 03 16:03:03 [8611] ha-idg-1 stonith-ng:     info: 
> update_cib_stonith_devices_v2:    Updating device list from the cib: modify

> lrm_rsc_op[@id='vm_mausdb_last_0']
> 
> Hosts are SLES 12 SP3, pacenaker is the most recent for my OS: 
> pacemaker-1.1.16-6.5.1.x86_64
> 
> I don't have any clue, i'm thankful for any hint. I can provide you with 
> further information if needed.
> 
> 
> Bernd
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.lentes at helmholtz-muenchen.de | 
> bernd.lentes at helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | 
> http://www.helmholtz-muenchen.de/idg ] 
> 
> wer Fehler macht kann etwas lernen 
> wer nichts macht kann auch nichts lernen
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDirig.in Petra Steiner-Hoffmann
> Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





More information about the Users mailing list