[ClusterLabs] why is node fenced ?
Lentes, Bernd
bernd.lentes at helmholtz-muenchen.de
Wed Jul 29 11:26:46 EDT 2020
Hi,
a few days ago one of my nodes was fenced and i don't know why, which is something i really don't like.
What i did:
I put one node (ha-idg-1) in standby. The resources on it (most of all virtual domains) were migrated to ha-idg-2,
except one domain (vm_nextcloud). On ha-idg-2 a mountpoint was missing the xml of the domain points to.
Then the cluster tries to start vm_nextcloud on ha-idg-2 which of course also failed.
Then ha-idg-1 was fenced.
I did a "crm history" over the respective time period, you find it here:
https://hmgubox2.helmholtz-muenchen.de/index.php/s/529dfcXf5a72ifF
Here, from my point of view, the most interesting from the logs:
ha-idg-1:
Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: Diff: --- 2.16196.19 2
Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: Diff: +++ 2.16197.0 bc9a558dfbe6d7196653ce56ad1ee758
Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + /cib: @epoch=16197, @num_updates=0
Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + /cib/configuration/nodes/node[@id='1084777482']/instance_attributes[@id='nodes-108
4777482']/nvpair[@id='nodes-1084777482-standby']: @value=on
ha-idg-1 set to standby
Jul 20 16:59:34 [23768] ha-idg-1 crmd: notice: process_lrm_event: ha-idg-1-vm_nextcloud_migrate_to_0:3169 [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory\nocf-exit-reason:vm_nextcloud: live migration to ha-idg-2 failed: 1\n ]
migration failed
Jul 20 17:04:01 [23767] ha-idg-1 pengine: error: native_create_actions: Resource vm_nextcloud is active on 2 nodes (attempting recovery)
???
Jul 20 17:04:01 [23767] ha-idg-1 pengine: notice: LogAction: * Recover vm_nextcloud ( ha-idg-2 )
Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: Initiating stop operation vm_nextcloud_stop_0 on ha-idg-2 | action 106
Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: Initiating stop operation vm_nextcloud_stop_0 locally on ha-idg-1 | action 2
Jul 20 17:04:01 [23768] ha-idg-1 crmd: info: match_graph_event: Action vm_nextcloud_stop_0 (106) confirmed on ha-idg-2 (rc=0)
Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960
Jul 20 17:05:29 [23761] ha-idg-1 pacemakerd: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler)
systemctl stop pacemaker.service
ha-idg-2:
Jul 20 17:04:03 [10691] ha-idg-2 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-2: 0 (ok) | call=157 key=vm_nextcloud_stop_0 confirmed=true cib-update=57
the log from ha-idg-2 is two seconds ahead of ha-idg-1
Jul 20 17:04:08 [10688] ha-idg-2 lrmd: notice: log_execute: executing - rsc:vm_nextcloud action:start call_id:192
Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ error: Failed to create domain from /mnt/share/vm_nextcloud.xml ]
Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory ]
Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ ocf-exit-reason:Failed to start virtual domain vm_nextcloud. ]
Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: log_finished: finished - rsc:vm_nextcloud action:start call_id:192 pid:29107 exit-code:1 exec-time:581ms queue-time:0ms
start on ha-idg-2 failed
Jul 20 17:05:32 [10691] ha-idg-2 crmd: info: do_dc_takeover: Taking over DC status for this partition
ha-idg-1 stopped pacemaker
Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: unpack_rsc_op_failure: Processing failed migrate_to of vm_nextcloud on ha-idg-1: unknown error | rc=1
Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: unpack_rsc_op_failure: Processing failed start of vm_nextcloud on ha-idg-2: unknown error | rc
Jul 20 17:05:33 [10690] ha-idg-2 pengine: info: native_color: Resource vm_nextcloud cannot run anywhere
logical
Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (pending)
???
Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline)
Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: pe_fence_node: Cluster node ha-idg-1 will be fenced: resource actions are unrunnable
Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: stage6: Scheduling Node ha-idg-1 for STONITH
Jul 20 17:05:35 [10690] ha-idg-2 pengine: info: native_stop_constraints: vm_nextcloud_stop_0 is implicit after ha-idg-1 is fenced
Jul 20 17:05:35 [10690] ha-idg-2 pengine: notice: LogNodeActions: * Fence (Off) ha-idg-1 'resource actions are unrunnable'
Why does it say "Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline)" although
"Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960"
says that stop was ok ?
Bernd
--
Bernd Lentes
Systemadministration
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.lentes at helmholtz-muenchen.de
phone: +49 89 3187 1241
phone: +49 89 3187 3827
fax: +49 89 3187 2294
http://www.helmholtz-muenchen.de/mcd
stay healthy
Helmholtz Zentrum München
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
More information about the Users
mailing list