[ClusterLabs] why is node fenced ?

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Wed Jul 29 11:26:46 EDT 2020


Hi,

a few days ago one of my nodes was fenced and i don't know why, which is something i really don't like.
What i did:
I put one node (ha-idg-1) in standby. The resources on it (most of all virtual domains) were migrated to ha-idg-2,
except one domain (vm_nextcloud). On ha-idg-2 a mountpoint was missing the xml of the domain points to.
Then the cluster tries to start vm_nextcloud on ha-idg-2 which of course also failed.
Then ha-idg-1 was fenced.

I did a "crm history" over the respective time period, you find it here:
https://hmgubox2.helmholtz-muenchen.de/index.php/s/529dfcXf5a72ifF

Here, from my point of view, the most interesting from the logs:
ha-idg-1:
Jul 20 16:59:33 [23763] ha-idg-1        cib:     info: cib_perform_op:  Diff: --- 2.16196.19 2
Jul 20 16:59:33 [23763] ha-idg-1        cib:     info: cib_perform_op:  Diff: +++ 2.16197.0 bc9a558dfbe6d7196653ce56ad1ee758
Jul 20 16:59:33 [23763] ha-idg-1        cib:     info: cib_perform_op:  +  /cib:  @epoch=16197, @num_updates=0
Jul 20 16:59:33 [23763] ha-idg-1        cib:     info: cib_perform_op:  +  /cib/configuration/nodes/node[@id='1084777482']/instance_attributes[@id='nodes-108
4777482']/nvpair[@id='nodes-1084777482-standby']:  @value=on
ha-idg-1 set to standby

Jul 20 16:59:34 [23768] ha-idg-1       crmd:   notice: process_lrm_event:       ha-idg-1-vm_nextcloud_migrate_to_0:3169 [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory\nocf-exit-reason:vm_nextcloud: live migration to ha-idg-2 failed: 1\n ]
migration failed

Jul 20 17:04:01 [23767] ha-idg-1    pengine:    error: native_create_actions:   Resource vm_nextcloud is active on 2 nodes (attempting recovery)
???

Jul 20 17:04:01 [23767] ha-idg-1    pengine:   notice: LogAction:        * Recover    vm_nextcloud           (             ha-idg-2 )

Jul 20 17:04:01 [23768] ha-idg-1       crmd:   notice: te_rsc_command:  Initiating stop operation vm_nextcloud_stop_0 on ha-idg-2 | action 106
Jul 20 17:04:01 [23768] ha-idg-1       crmd:   notice: te_rsc_command:  Initiating stop operation vm_nextcloud_stop_0 locally on ha-idg-1 | action 2

Jul 20 17:04:01 [23768] ha-idg-1       crmd:     info: match_graph_event:       Action vm_nextcloud_stop_0 (106) confirmed on ha-idg-2 (rc=0)

Jul 20 17:04:06 [23768] ha-idg-1       crmd:   notice: process_lrm_event:       Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960

Jul 20 17:05:29 [23761] ha-idg-1 pacemakerd:   notice: crm_signal_dispatch:     Caught 'Terminated' signal | 15 (invoking handler)
systemctl stop pacemaker.service


ha-idg-2:
Jul 20 17:04:03 [10691] ha-idg-2       crmd:   notice: process_lrm_event:       Result of stop operation for vm_nextcloud on ha-idg-2: 0 (ok) | call=157 key=vm_nextcloud_stop_0 confirmed=true cib-update=57
the log from ha-idg-2 is two seconds ahead of ha-idg-1

Jul 20 17:04:08 [10688] ha-idg-2       lrmd:   notice: log_execute:     executing - rsc:vm_nextcloud action:start call_id:192
Jul 20 17:04:09 [10688] ha-idg-2       lrmd:   notice: operation_finished:      vm_nextcloud_start_0:29107:stderr [ error: Failed to create domain from /mnt/share/vm_nextcloud.xml ]
Jul 20 17:04:09 [10688] ha-idg-2       lrmd:   notice: operation_finished:      vm_nextcloud_start_0:29107:stderr [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory ]
Jul 20 17:04:09 [10688] ha-idg-2       lrmd:   notice: operation_finished:      vm_nextcloud_start_0:29107:stderr [ ocf-exit-reason:Failed to start virtual domain vm_nextcloud. ]
Jul 20 17:04:09 [10688] ha-idg-2       lrmd:   notice: log_finished:    finished - rsc:vm_nextcloud action:start call_id:192 pid:29107 exit-code:1 exec-time:581ms queue-time:0ms
start on ha-idg-2 failed

Jul 20 17:05:32 [10691] ha-idg-2       crmd:     info: do_dc_takeover:  Taking over DC status for this partition
ha-idg-1 stopped pacemaker

Jul 20 17:05:33 [10690] ha-idg-2    pengine:  warning: unpack_rsc_op_failure:   Processing failed migrate_to of vm_nextcloud on ha-idg-1: unknown error | rc=1
Jul 20 17:05:33 [10690] ha-idg-2    pengine:  warning: unpack_rsc_op_failure:   Processing failed start of vm_nextcloud on ha-idg-2: unknown error | rc

Jul 20 17:05:33 [10690] ha-idg-2    pengine:     info: native_color:    Resource vm_nextcloud cannot run anywhere
logical

Jul 20 17:05:33 [10690] ha-idg-2    pengine:  warning: custom_action:   Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (pending)
???

Jul 20 17:05:35 [10690] ha-idg-2    pengine:  warning: custom_action:   Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline)
Jul 20 17:05:35 [10690] ha-idg-2    pengine:  warning: pe_fence_node:   Cluster node ha-idg-1 will be fenced: resource actions are unrunnable
Jul 20 17:05:35 [10690] ha-idg-2    pengine:  warning: stage6:  Scheduling Node ha-idg-1 for STONITH
Jul 20 17:05:35 [10690] ha-idg-2    pengine:     info: native_stop_constraints: vm_nextcloud_stop_0 is implicit after ha-idg-1 is fenced
Jul 20 17:05:35 [10690] ha-idg-2    pengine:   notice: LogNodeActions:   * Fence (Off) ha-idg-1 'resource actions are unrunnable'


Why does it say "Jul 20 17:05:35 [10690] ha-idg-2    pengine:  warning: custom_action:   Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline)" although
"Jul 20 17:04:06 [23768] ha-idg-1       crmd:   notice: process_lrm_event:       Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960"
says that stop was ok ?


Bernd

-- 

Bernd Lentes 
Systemadministration 
Institute for Metabolism and Cell Death (MCD) 
Building 25 - office 122 
HelmholtzZentrum München 
bernd.lentes at helmholtz-muenchen.de 
phone: +49 89 3187 1241 
phone: +49 89 3187 3827 
fax: +49 89 3187 2294 
http://www.helmholtz-muenchen.de/mcd 

stay healthy
Helmholtz Zentrum München

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671




More information about the Users mailing list