[ClusterLabs] Antw: live migration rarely fails seemingly without reason
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Dec 4 02:17:06 EST 2018
Hi!
It seems your systems run with non-operative fencing, and the cluster wants to
fence a node. Maybe bring the cluster to a clean state first, then repeat the
test.
Regards,
Ulrich
>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 03.12.2018
um
16:40 in Nachricht
<1271670904.2579771.1543851612288.JavaMail.zimbra at helmholtz-muenchen.de>:
> Hi,
>
> i have a two node cluster with several VirtualDomains as resources. Normally
> live migration is no problem. But rarely it fails, without giving any
> reasonable
> message in the logs. I tried to migrate several VirtualDmains concurrently
> from ha-idg-2 to ha-idg-1. One VirtualDomain failed, the others suceeded.
>
> This happened already once before, i *think* it was also when i tried to
> migrate several VirtualDomains concurrently, but i'm not complete sure (it's
> already
> some time ago).
>
> But the error appears rarely, it often suceeded without any problems. I
> *think* it's related to migrating several domains concurrently.
>
> node 1:
> 2018-12-03T16:03:03.037872+01:00 ha-idg-1 crmd[8615]: warning: Action 44
> (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 2018-12-03T16:03:03.038234+01:00 ha-idg-1 crmd[8615]: notice: Transition
> 44 aborted by operation vm_mausdb_migrate_to_0 'modify' on ha-idg-2: Event
> failed
> 2018-12-03T16:03:03.038569+01:00 ha-idg-1 crmd[8615]: warning: Action 44
> (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
>
> 2018-12-03T16:03:03.041771+01:00 ha-idg-1 stonith-ng[8611]: warning:
> fence_ha-idg-2 has 'action' parameter, which should never be specified in
> configuration
> 2018-12-03T16:03:03.042112+01:00 ha-idg-1 stonith-ng[8611]: warning:
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:03:03.042365+01:00 ha-idg-1 stonith-ng[8611]: warning:
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:03.208895+01:00 ha-idg-1 sshd[6397]: Received disconnect
> from 146.107.235.132 port 46004:11: disconnected by user
> 2018-12-03T16:03:03.209367+01:00 ha-idg-1 sshd[6397]: Disconnected from
> 146.107.235.132 port 46004
> 2018-12-03T16:03:03.209670+01:00 ha-idg-1 sshd[6397]:
> pam_unix(sshd:session): session closed for user root
> 2018-12-03T16:03:03.213996+01:00 ha-idg-1 systemd-logind[1627]: Removed
> session 17.
> 2018-12-03T16:03:03.286986+01:00 ha-idg-1 crmd[8615]: notice: Transition
> 44 (Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=9,
> Source=/var/lib/pacemaker/pengine/pe-input-346.bz2): Stopped
> 2018-12-03T16:03:03.290762+01:00 ha-idg-1 stonith-ng[8611]: warning:
> fence_ha-idg-2 has 'action' parameter, which should never be specified in
> configuration
> 2018-12-03T16:03:03.291224+01:00 ha-idg-1 stonith-ng[8611]: warning:
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:03:03.291619+01:00 ha-idg-1 stonith-ng[8611]: warning:
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:03.313864+01:00 ha-idg-1 pengine[8614]: warning:
> Processing failed op migrate_to for vm_mausdb on ha-idg-2: unknown error
(1)
> 2018-12-03T16:03:03.314193+01:00 ha-idg-1 pengine[8614]: warning:
> Processing failed op migrate_to for vm_mausdb on ha-idg-2: unknown error
(1)
> 2018-12-03T16:03:03.316485+01:00 ha-idg-1 pengine[8614]: notice: Recover
> vm_mausdb#011(Started ha-idg-2 -> ha-idg-1)
> 2018-12-03T16:03:03.316805+01:00 ha-idg-1 pengine[8614]: notice: Migrate
> vm_geneious#011(Started ha-idg-2 -> ha-idg-1)
> 2018-12-03T16:03:03.317914+01:00 ha-idg-1 pengine[8614]: notice:
> Calculated transition 45, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-347.bz2
>
> node2:
> 2018-12-03T16:02:32.312598+01:00 ha-idg-2 VirtualDomain(vm_mausdb)[7988]:
> INFO: mausdb: Starting live migration to ha-idg-1 (using: virsh
> --connect=qemu:///system --quiet migrate --live ma
> usdb qemu+ssh://ha-idg-1/system ).
> 2018-12-03T16:02:32.315358+01:00 ha-idg-2 VirtualDomain(vm_geneious)[7989]:
> INFO: geneious: Starting live migration to ha-idg-1 (using: virsh
> --connect=qemu:///system --quiet migrate --live
> geneious qemu+ssh://ha-idg-1/system ).
> 2018-12-03T16:02:32.391911+01:00 ha-idg-2 lrmd[14256]: notice: executing -
> rsc:vm_sim action:stop call_id:78
> 2018-12-03T16:02:32.394065+01:00 ha-idg-2 stonith-ng[14255]: warning:
> fence_ha-idg-1 has 'action' parameter, which should never be specified in
> configuration
> 2018-12-03T16:02:32.394369+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.394619+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:02:32.459622+01:00 ha-idg-2 VirtualDomain(vm_sim)[8599]: INFO:
> Domain sim already stopped.
> 2018-12-03T16:02:32.464042+01:00 ha-idg-2 lrmd[14256]: notice: finished -
> rsc:vm_sim action:stop call_id:78 pid:8599 exit-code:0 exec-time:72ms
> queue-time:0ms
> 2018-12-03T16:02:32.464750+01:00 ha-idg-2 crmd[14259]: notice: Result of
> stop operation for vm_sim on ha-idg-2: 0 (ok)
> 2018-12-03T16:02:32.470496+01:00 ha-idg-2 stonith-ng[14255]: warning:
> fence_ha-idg-1 has 'action' parameter, which should never be specified in
> configuration
> 2018-12-03T16:02:32.470794+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.471043+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:02:32.628562+01:00 ha-idg-2 stonith-ng[14255]: warning:
> fence_ha-idg-1 has 'action' parameter, which should never be specified in
> configuration
> 2018-12-03T16:02:32.628970+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_off_action='Off'
> 2018-12-03T16:02:32.629226+01:00 ha-idg-2 stonith-ng[14255]: warning:
> Mapping action='Off' to pcmk_reboot_action='Off'
> 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03
> 15:03:02.835+0000: 4515: error : qemuMigrationCheckJobStatus:1456 :
operation
> failed: migration job: unexpectedly failed
> 2018-12-03T16:03:03.006918+01:00 ha-idg-2 VirtualDomain(vm_mausdb)[7988]:
> ERROR: mausdb: live migration to ha-idg-1 failed: 1
> 2018-12-03T16:03:03.032370+01:00 ha-idg-2 kernel: [ 6593.265436] br0: port
> 2(vnet0) entered disabled state
> 2018-12-03T16:03:03.032383+01:00 ha-idg-2 kernel: [ 6593.266268] device
> vnet0 left promiscuous mode
> 2018-12-03T16:03:03.032385+01:00 ha-idg-2 kernel: [ 6593.266275] br0: port
> 2(vnet0) entered disabled state
> 2018-12-03T16:03:03.032789+01:00 ha-idg-2 lrmd[14256]: notice:
> vm_mausdb_migrate_to_0:7988:stderr [ error: operation failed: migration job:
> unexpectedly failed ]
> 2018-12-03T16:03:03.033126+01:00 ha-idg-2 lrmd[14256]: notice:
> vm_mausdb_migrate_to_0:7988:stderr [ ocf-exit-reason:mausdb: live migration
> to ha-idg-1 failed: 1 ]
> 2018-12-03T16:03:03.033376+01:00 ha-idg-2 lrmd[14256]: notice: finished -
> rsc:vm_mausdb action:migrate_to call_id:75 pid:7988 exit-code:1
> exec-time:31065ms queue-time:0ms
> 2018-12-03T16:03:03.033909+01:00 ha-idg-2 crmd[14259]: notice: Result of
> migrate_to operation for vm_mausdb on ha-idg-2: 1 (unknown error)
> 2018-12-03T16:03:03.034198+01:00 ha-idg-2 crmd[14259]: notice:
> ha-idg-2-vm_mausdb_migrate_to_0:75 [ error: operation failed: migration job:
> unexpectedly failed\nocf-exit-reason:mausdb: live migration to ha-idg-1
> failed: 1\n ]
>
> /var/log/pacemaker.log:
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng: info: cib_devices_update:
> Updating devices to version 1.789.6
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng: warning: xml2device_params:
> fence_ha-idg-2 has 'action' parameter, which should never be specified in
> configuration
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng: warning: map_action:
> Mapping action='Off' to pcmk_off_action='Off'
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng: warning: map_action:
> Mapping action='Off' to pcmk_reboot_action='Off'
> Dec 03 16:02:32 [8611] ha-idg-1 stonith-ng: info: cib_device_update:
> Device fence_ha-idg-1 has been disabled on ha-idg-1: score=-INFINITY
> Dec 03 16:02:37 [8610] ha-idg-1 cib: info: cib_process_ping:
> Reporting our current digest to ha-idg-1: 2715b3a8fd056fbd4d84cb12b911a328
> for 1.789.6 (0x1aebf00 0)
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_perform_op:
> Diff: --- 1.789.6 2
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_perform_op:
> Diff: +++ 1.789.7 (null)
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_perform_op: +
> /cib: @num_updates=7
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_perform_op: +
>
/cib/status/node_state[@id='1084777492']/lrm[@id='1084777492']/lrm_resources/
> lrm_resource[@id='vm_mausdb']/lrm_rsc
> _op[@id='vm_mausdb_last_0']:
> @transition-magic=0:1;44:44:0:e4b76be7-fdb6-47c1-a905-9a49650b4180,
> @exit-reason=mausdb: live migration to ha-idg-1 failed: 1, @call-id=75,
> @rc-code=1, @op-sta
> tus=0, @exec-time=31065
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_perform_op: ++
>
/cib/status/node_state[@id='1084777492']/lrm[@id='1084777492']/lrm_resources/
> lrm_resource[@id='vm_mausdb']: <lrm_
> rsc_op id="vm_mausdb_last_failure_0" operation_key="vm_mausdb_migrate_to_0"
> operation="migrate_to" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.13" transition-key="44:44:0:e4b
> 76be7-fdb6-47c1-a905-9a49650b4180"
> transition-magic="0:1;44:44:0:e4b76be7-fdb6-47c1-a905-9a49650b4180"
> exit-reason="mausdb: live migrat
> Dec 03 16:03:03 [8610] ha-idg-1 cib: info: cib_process_request:
> Completed cib_modify operation for section status: OK (rc=0,
> origin=ha-idg-2/crmd/74, version=1.789.7)
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: warning: status_from_rc:
> Action 44 (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1):
> Error
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: notice:
> abort_transition_graph: Transition 44 aborted by operation
> vm_mausdb_migrate_to_0 'modify' on ha-idg-2: Event failed | magic=0:1;44:4
> 4:0:e4b76be7-fdb6-47c1-a905-9a49650b4180 cib=1.789.7
> source=match_graph_event:310 complete=false
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: info: match_graph_event:
> Action vm_mausdb_migrate_to_0 (44) confirmed on ha-idg-2 (rc=1)
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: info: process_graph_event:
> Detected action (44.44) vm_mausdb_migrate_to_0.75=unknown error: failed
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: warning: status_from_rc:
> Action 44 (vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1):
> Error
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: info:
> abort_transition_graph: Transition 44 aborted by operation
> vm_mausdb_migrate_to_0 'create' on ha-idg-2: Event failed | magic=0:1;44:4
> 4:0:e4b76be7-fdb6-47c1-a905-9a49650b4180 cib=1.789.7
> source=match_graph_event:310 complete=false
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: info: match_graph_event:
> Action vm_mausdb_migrate_to_0 (44) confirmed on ha-idg-2 (rc=1)
> Dec 03 16:03:03 [8615] ha-idg-1 crmd: info: process_graph_event:
> Detected action (44.44) vm_mausdb_migrate_to_0.75=unknown error: failed
> Dec 03 16:03:03 [8611] ha-idg-1 stonith-ng: info:
> update_cib_stonith_devices_v2: Updating device list from the cib: modify
> lrm_rsc_op[@id='vm_mausdb_last_0']
>
> Hosts are SLES 12 SP3, pacenaker is the most recent for my OS:
> pacemaker-1.1.16-6.5.1.x86_64
>
> I don't have any clue, i'm thankful for any hint. I can provide you with
> further information if needed.
>
>
> Bernd
> --
>
> Bernd Lentes
> Systemadministration
> Institut für Entwicklungsgenetik
> Gebäude 35.34 - Raum 208
> HelmholtzZentrum münchen
> [ mailto:bernd.lentes at helmholtz-muenchen.de |
> bernd.lentes at helmholtz-muenchen.de ]
> phone: +49 89 3187 1241
> fax: +49 89 3187 2294
> [ http://www.helmholtz-muenchen.de/idg |
> http://www.helmholtz-muenchen.de/idg ]
>
> wer Fehler macht kann etwas lernen
> wer nichts macht kann auch nichts lernen
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDirig.in Petra Steiner-Hoffmann
> Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list