[ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

Fri Nov 5 04:22:56 EDT 2021

On 05.11.2021 01:20, Ken Gaillot wrote:
>>
>> There are two issues discussed in this thread.
>>
>> 1. Remote node is fenced when connection with this node is lost. For
>> all
>> I can tell this is intended and expected behavior. That was the
>> original
>> question.
> 
> It's expected only because the connection can't be recovered elsewhere.
> If another node can run the connection, pacemaker will try to reconnect
> from there and re-probe everything to make sure what the current state
> is.
> 

That's not what I see in sources and documentation and not what I
obverse. Pacemaker will reprobe from another node only after attempting
fencing of remote node.

Documentation (pacemaker remote)

  reconnect_interval: If this is a positive time interval, the cluster
will attempt to
                      reconnect to a remote node after an active
connection has been
                      lost at this interval. Otherwise, the cluster will
attempt to
                      reconnect immediately (*after any fencing needed*).

Config:

node 1: ha1 \

	attributes pingd=1 \

	utilization cpu=20

node 2: ha2 \

	attributes pingd=1 \

	utilization cpu=20

primitive qnetd ocf:pacemaker:remote \

	params reconnect_interval=30s \

	op monitor timeout=5s interval=10s \

	meta target-role=Started

primitive stonith_sbd stonith:external/sbd \

	op monitor interval=3600

property cib-bootstrap-options: \

	cluster-infrastructure=corosync \

	cluster-name=ha \

	dc-version="2.1.0+20210816.c6a4f6e6c-1.1-2.1.0+20210816.c6a4f6e6c" \

	last-lrm-refresh=1635607816 \

	stonith-enabled=true \

	have-watchdog=true \

	stonith-watchdog-timeout=0 \

	placement-strategy=balanced

Logs (skipping CIB updates, remote node active on ha1, ha2 is DC):

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(monitor_timeout_cb at controld_remote_ra.c:474)        info: Timed out
waiting for remote poke response from qnetd

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(process_lrm_event at controld_execd.c:2826)    error: Result of monitor
operation for qnetd on ha1: Timed Out | call=3 key=qnetd_monitor_10000
timeout=5000ms

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(lrmd_api_disconnect at lrmd_client.c:1640)     info: Disconnecting TLS
qnetd executor connection

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(lrmd_tls_connection_destroy at lrmd_client.c:562)      info: TLS
connection destroyed

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(remote_lrm_op_callback at controld_remote_ra.c:578)    error: Lost
connection to Pacemaker Remote node qnetd

Nov 05 10:31:26.742 ha1 pacemaker-controld  [3246]
(lrmd_api_disconnect at lrmd_client.c:1640)     info: Disconnecting TLS
qnetd executor connection

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(determine_online_status_fencing at unpack.c:1434)      info: Node ha2 is
active

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(determine_online_status at unpack.c:1574)      info: Node ha2 is online

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(determine_online_status_fencing at unpack.c:1434)      info: Node ha1 is
active

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(determine_online_status at unpack.c:1574)      info: Node ha1 is online

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(unpack_rsc_op_failure at unpack.c:3022)        warning: Unexpected result
(error) was recorded for monitor of qnetd on ha1 at Nov  5 10:31:26 2021
| rc=1 id=qnetd_last_failure_0

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(unpack_rsc_op_failure at unpack.c:3117)        notice: qnetd will not be
started under current conditions

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(pe_fence_node at unpack.c:143)         warning: Remote node qnetd will be
fenced: remote connection is unrecoverable

Nov 05 10:31:26.773 ha2 pacemaker-schedulerd[3313]
(pcmk__native_allocate at pcmk_sched_native.c:626)      info: Resource
qnetd cannot run anywhere

...
Nov 05 10:31:26.777 ha2 pacemaker-schedulerd[3313]
(stage6 at pcmk_sched_allocate.c:1634)  warning: Scheduling Node qnetd for
STONITH

Nov 05 10:31:26.777 ha2 pacemaker-schedulerd[3313]
(log_list_item at output_log.c:198)     notice: Actions: Fence (reboot)
qnetd 'remote connection is unrecoverable'

Nov 05 10:31:26.777 ha2 pacemaker-schedulerd[3313]
(rsc_action_default at pcmk_output.c:928)       info: Leave   stonith_sbd
     (Started ha2)

Nov 05 10:31:26.777 ha2 pacemaker-schedulerd[3313]
(log_list_item at output_log.c:198)     notice: Actions: Stop       qnetd
         (   ha1 )  due to node availability

...
Nov 05 10:31:26.869 ha2 pacemaker-controld  [3314]
(te_rsc_command at controld_te_actions.c:320)   notice: Initiating stop
operation qnetd_stop_0 on ha1 | action 4

Nov 05 10:31:26.869 ha2 pacemaker-controld  [3314]
(te_fence_node at controld_fencing.c:869)       notice: Requesting fencing
(reboot) of node qnetd | action=3 timeout=60000

Nov 05 10:31:26.869 ha2 pacemaker-fenced    [3310]
(handle_request at fenced_commands.c:2759)      notice: Client
pacemaker-controld.3314 wants to fence (reboot) qnetd using any device

Nov 05 10:31:26.869 ha2 pacemaker-fenced    [3310]
(initiate_remote_stonith_op at fenced_remote.c:1162)    notice: Requesting
peer fencing (reboot) targeting qnetd | id=9c3bee1c state=querying
base_timeout=60

...
Nov 05 10:31:38.441 ha2 pacemaker-fenced    [3310]
(log_operation at fenced_commands.c:2122)       notice: Operation 'reboot'
[3568] (call 2 from pacemaker-controld.3314) targeting qnetd using
stonith_sbd returned 0 (OK)

Nov 05 10:31:38.445 ha2 pacemaker-fenced    [3310]
(remote_op_done at fenced_remote.c:544)         notice: Operation 'reboot'
targeting qnetd by ha2 for pacemaker-controld.3314 at ha2: OK | id=9c3bee1c

Nov 05 10:31:38.445 ha2 pacemaker-controld  [3314]
(tengine_stonith_callback at controld_fencing.c:721)    notice: Stonith
operation 2/3:3:0:181e4342-7bd4-49c7-895e-aad769dd7f0a: OK (0)

Nov 05 10:31:38.445 ha2 pacemaker-controld  [3314]
(tengine_stonith_callback at controld_fencing.c:754)    info: Stonith
operation 2 for qnetd passed

...
Nov 05 10:31:38.449 ha2 pacemaker-controld  [3314]
(tengine_stonith_notify at controld_fencing.c:502)      notice: Peer qnetd
was terminated (reboot) by ha2 on behalf of pacemaker-controld.3314: OK
| initiator=ha2 ref=9c3bee1c-a4ab-46b5-9c79-9a0e1b3b8841

Nov 05 10:31:38.449 ha2 pacemaker-controld  [3314]
(controld_delete_node_state at controld_based.c:258)    info: Deleting all
state for node qnetd (via CIB call 46) |
xpath=//node_state[@uname='qnetd']/*

...
Nov 05 10:31:57.449 ha2 pacemaker-controld  [3314]
(crm_timer_popped at controld_timers.c:122)     info: Cluster Recheck Timer
just popped | input=I_PE_CALC time=19000ms

That message is interesting. I do not think it was recheck timeout, it
was too early. This is probably result of successful fencing, but it
certainly misleading.

Nov 05 10:31:57.449 ha2 pacemaker-controld  [3314]
(do_state_transition at controld_fsa.c:543)     notice: State transition
S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped

Nov 05 10:31:57.449 ha2 pacemaker-controld  [3314]
(do_state_transition at controld_fsa.c:629)     info: Progressed to state
S_POLICY_ENGINE after C_TIMER_POPPED

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(unpack_config at unpack.c:226)         info: Watchdog-based self-fencing
will be performed via SBD if fencing is required and
stonith-watchdog-timeout is nonzero

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(determine_online_status_fencing at unpack.c:1434)      info: Node ha2 is
active

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(determine_online_status at unpack.c:1574)      info: Node ha2 is online

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(determine_online_status_fencing at unpack.c:1434)      info: Node ha1 is
active

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(determine_online_status at unpack.c:1574)      info: Node ha1 is online

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe_get_failcount at failcounts.c:344)  info: qnetd has failed 1 times on ha1

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe__clear_failcount at failcounts.c:376)       notice: Clearing failure of
qnetd on ha1 because it expired | qnetd_clear_failcount_0

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(check_operation_expiry at unpack.c:3486)       info: Clearing monitor
failure will wait until any scheduled fencing of qnetd completes

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe_get_failcount at failcounts.c:344)  info: qnetd has failed 1 times on ha1

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe__clear_failcount at failcounts.c:376)       notice: Clearing failure of
qnetd on ha1 because it expired | qnetd_clear_failcount_0

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(check_operation_expiry at unpack.c:3486)       info: Clearing monitor
failure will wait until any scheduled fencing of qnetd completes

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(unpack_rsc_op at unpack.c:3760)        notice: Rescheduling
qnetd_monitor_10000 after failure expired on ha1 | actual=1 expected=0
magic=2:1;6:1:0:181e4342-7bd4-49c7-895e-aad769dd7f0a

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe_get_failcount at failcounts.c:344)  info: qnetd has failed 1 times on ha1

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(pe__clear_failcount at failcounts.c:376)       notice: Clearing failure of
qnetd on ha1 because it expired | qnetd_clear_failcount_0

...
Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(rsc_action_default at pcmk_output.c:928)       info: Leave   stonith_sbd
     (Started ha2)

Nov 05 10:31:57.453 ha2 pacemaker-schedulerd[3313]
(log_list_item at output_log.c:198)     notice: Actions: Start      qnetd
         (   ha1 )

Nov 05 10:31:57.462 ha1 pacemaker-controld  [3246]
(do_lrm_rsc_op at controld_execd.c:2281)        notice: Requesting local
execution of start operation for qnetd on ha1 |
transition_key=6:4:0:181e4342-7bd4-49c7-895e-aad769dd7f0a
op_key=qnetd_start_0

Nov 05 10:31:57.466 ha1 pacemaker-controld  [3246]
(pcmk__connect_remote at remote.c:1126)         info: Attempting remote
connection to 192.168.1.3:3121

...
Nov 05 10:32:03.974 ha1 pacemaker-controld  [3246]
(lrmd_tls_connection_destroy at lrmd_client.c:562)      info: TLS
connection destroyed

Nov 05 10:32:03.974 ha1 pacemaker-controld  [3246]
(lrmd_tcp_connect_cb at lrmd_client.c:1381)     info: Could not connect to
Pacemaker Remote at qnetd:3121: No route to host | rc=113

Nov 05 10:32:04.978 ha1 pacemaker-controld  [3246]
(pcmk__connect_remote at remote.c:1126)         info: Attempting remote
connection to 192.168.1.3:3121

Nov 05 10:32:06.978 ha1 pacemaker-controld  [3246]
(lrmd_tls_connection_destroy at lrmd_client.c:562)      info: TLS
connection destroyed

Nov 05 10:32:13.990 ha1 pacemaker-controld  [3246]
(pcmk__connect_remote at remote.c:1126)         info: Attempting remote
connection to 192.168.1.3:3121

Nov 05 10:32:15.994 ha1 pacemaker-controld  [3246]
(lrmd_tls_connection_destroy at lrmd_client.c:562)      info: TLS
connection destroyed

Nov 05 10:32:15.994 ha1 pacemaker-controld  [3246]
(lrmd_tcp_connect_cb at lrmd_client.c:1381)     info: Could not connect to
Pacemaker Remote at qnetd:3121: No route to host | rc=113

Nov 05 10:32:15.994 ha1 pacemaker-controld  [3246]
(process_lrm_event at controld_execd.c:2826)    error: Result of start
operation for qnetd on ha1: Timed Out | call=5 key=qnetd_start_0
timeout=20000ms

Nov 05 10:32:16.001 ha2 pacemaker-controld  [3314]
(update_failcount at controld_te_events.c:172)  info: Updating failcount
for qnetd on ha1 after failed start: rc=1 (update=INFINITY, time=1636097536)

...
Nov 05 10:32:16.041 ha2 pacemaker-schedulerd[3313]
(log_list_item at output_log.c:198)     notice: Actions: Move
stonith_sbd     ( ha2 -> ha1 )

Nov 05 10:32:16.045 ha2 pacemaker-schedulerd[3313]
(log_list_item at output_log.c:198)     notice: Actions: Recover    qnetd
         ( ha1 -> ha2 )

And only now will it move resource to ha2 and succeed.

> 
>> 2. Remote resource appears to not fail over. I cannot reproduce it,
>> but
>> then we also do not have the complete CIB, so something may affect
>> it.
>> OTOH logs shown stop before fencing has possibly succeeded, so may be
>> remote resource *did* fail over.
>>
>> What I see is - connection to remote node is lost, pacemaker fences
>> remote node and attempts to restart remote resource, if this is
>> unsuccessful (meaning - connection still could not be established)
>> remote resource fails over to another node.
>>
>> I do not know if it is possible to avoid fencing of remote node under
>> described conditions.
>>
>> What is somewhat interesting (and looks like a bug) - in my testing
>> pacemaker ignored failed fencing attempt and proceeded with
>> restarting
>> of remote resource. Is it expected behavior?
> 
> I don't see a failed fencing attempt (or any result of the fencing
> attempt) in the logs in the original message, only failures of the
> connection monitor.
> 

Here are logs without reconnect_interval parameter explicitly defined:

Nov 05 11:08:37.339 ha1 pacemaker-controld  [4649]
(lrmd_api_disconnect at lrmd_client.c:1640)     info: Disconnecting TLS
qnetd executor connection

Nov 05 11:08:37.339 ha1 pacemaker-controld  [4649]
(lrmd_tls_connection_destroy at lrmd_client.c:562)      info: TLS
connection destroyed

Nov 05 11:08:37.339 ha1 pacemaker-controld  [4649]
(remote_lrm_op_callback at controld_remote_ra.c:578)    error: Lost
connection to Pacemaker Remote node qnetd

Nov 05 11:08:37.339 ha1 pacemaker-controld  [4649]
(lrmd_api_disconnect at lrmd_client.c:1640)     info: Disconnecting TLS
qnetd executor connection

...
Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(unpack_rsc_op_failure at unpack.c:3022)        warning: Unexpected result
(error) was recorded for monitor of qnetd on ha1 at Nov  5 11:08:37 2021
| rc=1 id=qnetd_last_failure_0

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(pe_fence_node at unpack.c:143)         warning: Remote node qnetd will be
fenced: remote connection is unrecoverable

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(RecurringOp at pcmk_sched_native.c:855)        info:  Start recurring
monitor (10s) for qnetd on ha1

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(stage6 at pcmk_sched_allocate.c:1634)  warning: Scheduling Node qnetd for
STONITH

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(log_list_item at output_log.c:198)     notice: Actions: Fence (reboot)
qnetd 'remote connection is unrecoverable'

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(rsc_action_default at pcmk_output.c:928)       info: Leave   stonith_sbd
     (Started ha2)

Nov 05 11:08:37.355 ha1 pacemaker-schedulerd[4648]
(log_list_item at output_log.c:198)     notice: Actions: Recover    qnetd
         (   ha1 )

...
Nov 05 11:08:37.363 ha1 pacemaker-fenced    [4645]
(can_fence_host_with_device at fenced_commands.c:1784)  notice: stonith_sbd
is not eligible to fence (reboot) qnetd: static-list

...
Nov 05 11:08:37.375 ha1 pacemaker-fenced    [4645]
(stonith_choose_peer at fenced_remote.c:1301)   notice: Couldn't find
anyone to fence (reboot) qnetd using any device

...
Nov 05 11:08:37.379 ha1 pacemaker-fenced    [4645]
(remote_op_done at fenced_remote.c:544)         error: Operation 'reboot'
targeting qnetd by unknown node for pacemaker-controld.4649 at ha1: No such
device | id=260f5b86

Nov 05 11:08:37.379 ha1 pacemaker-controld  [4649]
(tengine_stonith_callback at controld_fencing.c:721)    notice: Stonith
operation 2/2:2:0:809530af-8cc5-4ff5-9193-bfd2ddc6a442: No such device (-19)

Nov 05 11:08:37.379 ha1 pacemaker-controld  [4649]
(tengine_stonith_callback at controld_fencing.c:797)    notice: Stonith
operation 2 for qnetd failed (No such device): aborting transition.

... and still

Nov 05 11:08:37.467 ha1 pacemaker-schedulerd[4648]
(RecurringOp at pcmk_sched_native.c:855)        info:  Start recurring
monitor (10s) for qnetd on ha1

Nov 05 11:08:37.467 ha1 pacemaker-schedulerd[4648]
(rsc_action_default at pcmk_output.c:928)       info: Leave   stonith_sbd
     (Started ha2)

Nov 05 11:08:37.467 ha1 pacemaker-schedulerd[4648]
(log_list_item at output_log.c:198)     notice: Actions: Start      qnetd
         (   ha1 )

Which fails and it moves to ha2.

The difference seems to be reconnect_interval parameter. If it is
present in remote resource definition, pacemaker will not proceed after
failed fencing.

As there is no real documentation how it is supposed to work I do not
know whether all of this is a bug or not. But one is certainly sure -
when connection to remote node is lost the first thing pacemaker does is
to fence it and only then initiate any recovery action.