[ClusterLabs] Stonith failing
Gabriele Bulfon
gbulfon at sonicle.com
Tue Jul 28 04:56:59 EDT 2020
Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by Corosync.
To check how stonith would work, I turned off Corosync service on second node.
First node try to attempt to stonith 2nd node and take care of its resources, but this fails.
Stonith action is configured to run a custom script to run ssh commands, both machines have reciprocal authorized keys to allow ssh without password.
The script does not contemplate the on/off commands, so it just returns 1 on those cases.
What I don't get is the "no route to host", who and what is it traying to do?
Jul 28 10:48:18 [9636] pengine: warning: stage6: Scheduling Node xstha2 for STONITH
Jul 28 10:48:18 [9636] pengine: info: native_stop_constraints: xstha1-stonith_stop_0 is implicit after xstha2 is fenced
Jul 28 10:48:18 [9636] pengine: info: native_stop_constraints: xstha2_san0_IP_stop_0 is implicit after xstha2 is fenced
Jul 28 10:48:18 [9636] pengine: notice: LogActions: Stop xstha1-stonith (xstha2)
Jul 28 10:48:18 [9636] pengine: info: LogActions: Leave xstha2-stonith (Started xstha1)
Jul 28 10:48:18 [9636] pengine: info: LogActions: Leave xstha1_san0_IP (Started xstha1)
Jul 28 10:48:18 [9636] pengine: notice: LogActions: Move xstha2_san0_IP (Started xstha2 -xstha1)
Jul 28 10:48:18 [9636] pengine: warning: process_pe_message: Calculated transition 15 (with warnings), saving inputs in /sonicle/var/cluster/lib/pacemaker/pengine/pe-warn-10.bz2
Jul 28 10:48:18 [9637] crmd: info: do_state_transition: State transition S_POLICY_ENGINE -S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Jul 28 10:48:18 [9637] crmd: info: do_te_invoke: Processing graph 15 (ref=pe_calc-dc-1595926098-89) derived from /sonicle/var/cluster/lib/pacemaker/pengine/pe-warn-10.bz2
Jul 28 10:48:18 [9637] crmd: notice: te_fence_node: Requesting fencing (poweroff) of node xstha2 | action=11 timeout=60000
Jul 28 10:48:18 [9633] stonith-ng: notice: handle_request: Client crmd.9637.bb39d7c9 wants to fence (poweroff) 'xstha2' with device '(any)'
Jul 28 10:48:18 [9633] stonith-ng: notice: initiate_remote_stonith_op: Requesting peer fencing (poweroff) of xstha2 | id=6fe78117-0075-c655-927d-f9326e9a6630 state=0
Jul 28 10:48:19 [9633] stonith-ng: info: process_remote_stonith_query: Query result 1 of 1 from xstha1 for xstha2/poweroff (1 devices) 6fe78117-0075-c655-927d-f9326e9a6630
Jul 28 10:48:19 [9633] stonith-ng: info: call_remote_stonith: Total timeout set to 60 for peer's fencing of xstha2 for crmd.9637|id=6fe78117-0075-c655-927d-f9326e9a6630
Jul 28 10:48:19 [9633] stonith-ng: info: call_remote_stonith: Requesting that 'xstha1' perform op 'xstha2 poweroff' for crmd.9637 (72s, 0s)
Jul 28 10:48:20 [9633] stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'xstha2'
Jul 28 10:48:21 xstorage1 stonith: [12141]: CRIT: external_reset_req: 'ssh-sonicle off' for host xstha2 failed with rc 1
Jul 28 10:48:21 [9633] stonith-ng: info: internal_stonith_action_execute: Attempt 2 to execute fence_legacy (poweroff). remaining timeout is 59
Jul 28 10:48:22 [9632] cib: info: cib_process_ping: Reporting our current digest to xstha1: 91ad9245488736582038cd758d58c08a for 0.8.75 (825f610 0)
Jul 28 10:48:23 xstorage1 stonith: [12152]: CRIT: external_reset_req: 'ssh-sonicle off' for host xstha2 failed with rc 1
Jul 28 10:48:23 [9633] stonith-ng: info: update_remaining_timeout: Attempted to execute agent fence_legacy (poweroff) the maximum number of times (2) allowed
Jul 28 10:48:23 [9633] stonith-ng: error: log_operation: Operation 'poweroff' [12150] (call 14 from crmd.9637) for host 'xstha2' with device 'xstha2-stonith' returned: -61 (No data available)
Jul 28 10:48:23 [9633] stonith-ng: warning: log_operation: xstha2-stonith:12150 [ Performing: stonith -t external/ssh-sonicle -T off xstha2 ]
Jul 28 10:48:23 [9633] stonith-ng: warning: log_operation: xstha2-stonith:12150 [ failed: xstha2 5 ]
Jul 28 10:48:23 [9633] stonith-ng: notice: stonith_choose_peer: Couldn't find anyone to fence (poweroff) xstha2 with any device
Jul 28 10:48:23 [9633] stonith-ng: info: call_remote_stonith: None of the 1 peers are capable of fencing (poweroff) xstha2 for crmd.9637 (1)
Jul 28 10:48:23 [9633] stonith-ng: error: remote_op_done: Operation poweroff of xstha2 by
for crmd.9637 at xstha1.6fe78117: No route to host
Jul 28 10:48:23 [9637] crmd: notice: tengine_stonith_callback: Stonith operation 14/11:15:0:5814817b-c10a-c931-fd0e-e9ee3b3a8e59: No route to host (-148)
Jul 28 10:48:23 [9637] crmd: notice: tengine_stonith_callback: Stonith operation 14 for xstha2 failed (No route to host): aborting transition.
Jul 28 10:48:23 [9637] crmd: notice: abort_transition_graph: Transition aborted: Stonith failed | source=tengine_stonith_callback:749 complete=false
Jul 28 10:48:23 [9637] crmd: notice: tengine_stonith_notify: Peer xstha2 was not terminated (poweroff) by
for xstha1: No route to host (ref=6fe78117-0075-c655-927d-f9326e9a6630) by client crmd.9637
Jul 28 10:48:23 [9637] crmd: notice: run_graph: Transition 15 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=5, Source=/sonicle/var/cluster/lib/pacemaker/pengine/pe-warn-10.bz2): Complete
Jul 28 10:48:23 [9637] crmd: notice: too_many_st_failures: Too many failures to fence xstha2 (13), giving up
Jul 28 10:48:23 [9637] crmd: info: do_log: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Jul 28 10:48:23 [9637] crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Sonicle S.r.l.
:
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200728/ea8cb749/attachment.htm>
More information about the Users
mailing list