[Pacemaker] Two node cluster and no hardware device for stonith.

Andrea a.bacchi at codices.com
Fri Jan 30 11:38:20 UTC 2015


Andrea <a.bacchi at ...> writes:

> 
> Sorry, I used wrong device id.
> Now, with the correct device id, I see 2 key reserved
> 
> [ONE] sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
>   PR generation=0x4, 2 registered reservation keys follow:
>     0x4d5a0001
>     0x4d5a0002
> 
> Tomorrow i will do some test for fencing...
> 

some news


If I try to fence serverHA2 with this command:
[ONE]pcs stonith fence serverHA2

I obtain that all seem to be ok, but serverHA2 freeze,
below the log from each node (on serverHA2 after loggin these lines, freeze)

The servers are  2 vmware virtual machine (I ask for an account on esx server 
to test fence_vmware, I'm waiting response)


log serverHA1


Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: handle_request: 	Client 
stonith_admin.1907.b13e0290 wants to fence (reboot) 'serverHA2' with device 
'(any)'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
initiate_remote_stonith_op: 	Initiating remote operation reboot for 
serverHA2: 70b75107-8919-4510-9c6c-7cc65e6a00a6 (0)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
can_fence_host_with_device: 	iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query: 	Query result 1 of 2 from serverHA1 for 
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: call_remote_stonith: 	
Total remote op timeout set to 120 for fencing of node serverHA2 for 
stonith_admin.1907.70b75107
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: call_remote_stonith: 	
Requesting that serverHA1 perform op reboot serverHA2 for stonith_admin.1907 
(144s)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
can_fence_host_with_device: 	iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
stonith_fence_get_devices_cb: 	Found 1 matching devices for 'serverHA2'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:  warning: stonith_device_execute: 	
Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' 
action instead
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query: 	Query result 2 of 2 from serverHA2 for 
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: log_operation: 	
Operation 'reboot' [1908] (call 2 from stonith_admin.1907) for host 'serverHA2' 
with device 'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:  warning: get_xpath_object: 	
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: remote_op_done: 	
Operation reboot of serverHA2 by serverHA1 for 
stonith_admin.1907 at serverHA1.70b75107: OK
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify: 	
Peer serverHA2 was terminated (reboot) by serverHA1 for serverHA1: OK 
(ref=70b75107-8919-4510-9c6c-7cc65e6a00a6) by client stonith_admin.1907
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify: 	
Notified CMAN that 'serverHA2' is now fenced
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: crm_update_peer_join: 	
crmd_peer_down: Node serverHA2[2] - join-2 phase 4 -> 0
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: 
crm_update_peer_expected: 	crmd_peer_down: Node serverHA2[2] - expected 
state is now down (was member)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: erase_status_tag: 	
Deleting xpath: //node_state[@uname='serverHA2']/lrm
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: erase_status_tag: 	
Deleting xpath: //node_state[@uname='serverHA2']/transient_attributes
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: tengine_stonith_notify: 	
External fencing operation from stonith_admin.1907 fenced serverHA2
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph: 	
Transition aborted: External Fencing Operation 
(source=tengine_stonith_notify:248, 1)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: do_state_transition: 	
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 30 12:13:03 [2514] serverHA1       crmd:  warning: do_state_transition: 	
Only 1 of 2 cluster nodes are eligible to run resources - continue 0
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/333)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Forwarding cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm to master (origin=local/crmd/334)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Forwarding cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes to master 
(origin=local/crmd/335)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.86 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.87 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=87
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	+  
/cib/status/node_state[@id='serverHA2']:  
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Completed cib_modify operation for section status: OK (rc=0, 
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.87 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.88 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	-- 
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=88
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, 
version=0.51.88)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.88 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.89 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	-- 
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=89
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request: 	
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, 
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: cib_fencing_updated: 	
Fencing update 333 for serverHA2: complete
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph: 	
Transition aborted by deletion of lrm[@id='serverHA2']: Resource state removal 
(cib=0.51.88, source=te_update_diff:429, 
path=/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'], 1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph: 	
Transition aborted by deletion of transient_attributes[@id='serverHA2']: 
Transient attribute change (cib=0.51.89, source=te_update_diff:391, 
path=/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2
'], 1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: process_pe_message: 	
Input has not changed since last time, not saving to disk
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: unpack_config: 	On loss 
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing: 	Node serverHA2 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
	Node serverHA2 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing: 	Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
	Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print: 	 Clone 
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print: 	 Clone 
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_print: 	
iscsi-stonith-device	(stonith:fence_scsi):	Started serverHA1 
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
ping:0	(Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
ping:1	(Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
clusterfs:0	(Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
clusterfs:1	(Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
iscsi-stonith-device	(Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: handle_response: 	
pe_calc calculation pe_calc-dc-1422616383-286 is obsolete
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: process_pe_message: 	
Calculated Transition 189: /var/lib/pacemaker/pengine/pe-input-145.bz2
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: unpack_config: 	On loss 
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing: 	- Node serverHA2 is not ready to run 
resources
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
	Node serverHA2 is pending
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing: 	Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
	Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print: 	 Clone 
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print: 	 Clone 
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print: 	     
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_print: 	
iscsi-stonith-device	(stonith:fence_scsi):	Started serverHA1 
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_color: 	
Resource ping:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_color: 	
Resource clusterfs:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: probe_resources: 	
Action probe_complete-serverHA2 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action: 	Action 
ping:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action: 	Action 
clusterfs:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action: 	Action 
iscsi-stonith-device_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: trigger_unfencing: 	
Unfencing serverHA2: node discovery
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
ping:0	(Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
ping:1	(Stopped)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
clusterfs:0	(Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
clusterfs:1	(Stopped)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions: 	Leave   
iscsi-stonith-device	(Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_state_transition: 	
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_te_invoke: 	
Processing graph 190 (ref=pe_calc-dc-1422616383-287) derived from 
/var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: process_pe_message: 	
Calculated Transition 190: /var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: te_fence_node: 	
Executing on fencing operation (5) on serverHA2 (timeout=60000)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: handle_request: 	Client 
crmd.2514.b5961dc1 wants to fence (on) 'serverHA2' with device '(any)'
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: 
initiate_remote_stonith_op: 	Initiating remote operation on for serverHA2: 
e19629dc-bec3-4e63-baf6-a7ecd5ed44bb (0)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query: 	Query result 2 of 2 from serverHA2 for 
serverHA2/on (1 devices) e19629dc-bec3-4e63-baf6-a7ecd5ed44bb
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query: 	All queries have arrived, continuing (2, 2, 2) 
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: call_remote_stonith: 	
Total remote op timeout set to 60 for fencing of node serverHA2 for 
crmd.2514.e19629dc
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: call_remote_stonith: 	
Requesting that serverHA2 perform op on serverHA2 for crmd.2514 (72s)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:  warning: get_xpath_object: 	
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: remote_op_done: 	
Operation on of serverHA2 by serverHA2 for crmd.2514 at serverHA1.e19629dc: OK
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: 
tengine_stonith_callback: 	Stonith operation 
9/5:190:0:4e500b84-bb92-4406-8f9c-f4140dd40ec7: OK (0)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify: 	
serverHA2 was successfully unfenced by serverHA2 (at the request of serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: run_graph: 	
Transition 190 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-146.bz2): Complete
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_log: 	FSA: Input 
I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: do_state_transition: 	
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]





log serverHA2



Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device: 	iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: remote_op_done: 	
Operation reboot of serverHA2 by serverHA1 for 
stonith_admin.1907 at serverHA1.70b75107: OK
Jan 30 12:13:11 [2631] serverHA2       crmd:     crit: tengine_stonith_notify: 	
We were alegedly just fenced by serverHA1 for serverHA1!
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.86 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.87 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=87
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	+  
/cib/status/node_state[@id='serverHA2']:  
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request: 	
Completed cib_modify operation for section status: OK (rc=0, 
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.87 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.88 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	-- 
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=88
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request: 	
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, 
version=0.51.88)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
--- 0.51.88 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	Diff: 
+++ 0.51.89 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	-- 
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op: 	+  
/cib:  @num_updates=89
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request: 	
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, 
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device: 	iscsi-stonith-device can fence (on) serverHA2: 
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device: 	iscsi-stonith-device can fence (on) serverHA2: 
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:     info: 
stonith_fence_get_devices_cb: 	Found 1 matching devices for 'serverHA2'
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: log_operation: 	
Operation 'on' [3037] (call 9 from crmd.2514) for host 'serverHA2' with device 
'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: remote_op_done: 	
Operation on of serverHA2 by serverHA2 for crmd.2514 at serverHA1.e19629dc: OK



I will continue testing....


Andrea





More information about the Pacemaker mailing list