[ClusterLabs] pcmk_remote evaluation (continued)

Mon Dec 11 15:06:03 EST 2017

On Wed, 2017-09-20 at 12:46 +0300, Vladislav Bogdanov wrote:
> Hi,
> 
> as 1.1.17 received a lot of care in pcmk_remote, I decided to try it
> again
> in rather big setup (less then previous, so I'm not hit by IPC
> disconnects here).
> 
> From the first runs there are still some severe issues when cluster
> nodes are fenced.
> 
> The following results are obtained by killing the DC node (md12k-3-
> srv) which was
> hosting remote resources for nodes es7700-3-srv, es12kxe-3-srv and
> es12kxe-7-srv.
> After the fence new DC (md12k-1-srv) has moved that resources the
> following way:
> =====
> Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Move    es12kxe-
> 3-srv      (Started md12k-3-srv -> md12k-4-srv)
> Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Move    es12kxe-
> 7-srv      (Started md12k-3-srv -> md12k-1-srv)
> Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Move    es7700-
> 3-srv       (Started md12k-3-srv -> md12k-2-srv)
> =====
> 
> * The first issue I found (and I expect that to be a reason for some
> other issues) is that
> pacemaker_remote does not drop an old crmds' connection after new
> crmd connects.
> As IPC proxy connections are in the hash table, there is a 50% chance
> that remoted tries to
> reach an old crmd to f.e. proxy checks of node attributes when
> resources are reprobed.
> That leads to timeouts of that resources' probes with consequent
> reaction from a cluster.
> A solution here could be to drop old IPC proxy connection as soon as
> new one is established.

We can't drop connections from the pacemaker_remoted side because it
doesn't know anything about the cluster state (e.g. whether the cluster
connection resource is live-migrating).

However we can simply always use the most recently connected provider,
which I think solves the issue. See commit e9a7e3bb, one of a few
recent bugfixes in the master branch for pacemaker_remoted. It will
most likely not make it into 2.0 (which I'm trying to focus on
deprecated syntax removals), but the next release after that.

> * I _suspect_ that the issue above could lead to following lines in a
> cluster node logs.
> I didn't check, but I suspect that when remoted decides to disconnect
> an old connection
> after fenced node goes up and TCP connections are reset - it
> disconnects a new one too.

pacemaker_remoted doesn't really initiate disconnections except when it
exits. All connections and disconnections should be initiated on the
cluster side.

> At least this issue happens at the same time fenced node rejoins a
> cluster.
> These logs are for the case no resources operate node attributes (I
> removed that resources
> from the CIB and set a stickiness for all others).
> =====
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Timed out (10000
> ms) while waiting for remote data
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Unable to receive
> expected reply, disconnecting.
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Remote lrmd server
> disconnected while waiting for reply with id 9823.
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Unexpected
> disconnect on remote-node es12kxe-7-srv

The messages above may be confusing, but the disconnect is originating
on the cluster side, after a timeout waiting for a reply from
pacemaker_remoted.

I don't know of any reason that a cluster node rejoining would cause
pacemaker_remoted to stop working, aside from when the cluster node is
the same host running pacemaker_remoted (which I've done accidentally
during testing).

> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Result of monitor
> operation for es12kxe-7-srv on md12k-1-srv: Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Couldn't perform
> lrmd_rsc_info operation (timeout=0): -107: Success (0)
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: LRMD disconnected
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Could not add
> resource ost0033-es03a to LRM es12kxe-7-srv
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Invalid resource
> definition for ost0033-es03a
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input   <create_request_adv origin="te_rsc_command" t="crmd"
> version="3.0.13" subt="request" reference="lrm_invoke-tengine-150
> 5897731-3872" crm_task="lrm_invoke" crm_sys_to="lrmd"
> crm_sys_from="tengine" crm_host_to="md12k-1-srv"
> acl_target="hacluster" crm_user="hacluster">
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input     <crm_xml>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input       <rsc_op id="221" operation="monitor"
> operation_key="ost0033-es03a_monitor_0" on_node="es12kxe-7-srv"
> on_node_uuid=
> "es12kxe-7-srv" router_node="md12k-1-srv" transition-
> key="221:23:7:7df0b5ef-5bfa-4d21-82e3-4dc964d6cffd">
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input         <primitive id="ost0033-es03a" class="ocf"
> provider="ddn" type="lustre-server"/>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input         <attributes CRM_meta_on_node="es12kxe-7-srv"
> CRM_meta_on_node_uuid="es12kxe-7-srv" CRM_meta_op_target_rc="7" CRM
> _meta_record_pending="false" CRM_meta_timeout="300000"
> crm_feature_set="3.0.13" device="/dev/ddn/es03a_ost0033"
> directory="/lustre/es03a/ost0033" lustre_resource_type="ost" manage_d
> irectory="1"/>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input       </rsc_op>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input     </crm_xml>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad
> input   </create_request_adv>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Resource ost0033-
> es03a no longer exists in the lrmd
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221
> (ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:   notice: Transition aborted
> by lrm_rsc_op.ost0033-es03a_last_failure_0: Event failed
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221
> (ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:    error: Result of probe
> operation for ost0033-es03a on es12kxe-7-srv: Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:   notice: Transition aborted
> by operation es12kxe-3-srv_monitor_30000 'create' on md12k-4-srv: Old
> event
> ...
> ep 20 08:56:41 md12k-1-srv attrd[2524]:   notice: Node md12k-3-srv
> state is now member
> Sep 20 08:56:41 md12k-1-srv cib[2511]:   notice: Node md12k-3-srv
> state is now member
> Sep 20 08:56:41 md12k-1-srv pacemakerd[2398]:   notice: Node md12k-3-
> srv state is now member
> Sep 20 08:56:41 md12k-1-srv crmd[11375]:   notice: Node md12k-3-srv
> state is now member
> Sep 20 08:56:41 md12k-1-srv stonith-ng[2522]:   notice: Node md12k-3-
> srv state is now member
> Sep 20 08:56:41 md12k-1-srv crmd[11375]:  warning: No reason to
> expect node 2 to be down
> Sep 20 08:56:41 md12k-1-srv crmd[11375]:   notice: Stonith/shutdown
> of md12k-3-srv not matched
> =====
> I cannot check if that is a true until the first issue is fixed.
> 
> * There are repeated probe results with rc 6
> (PCMK_OCF_NOT_CONFIGURED)
> and 189 (PCMK_OCF_CONNECTION_DIED) if probes are running on remote
> nodes
> after their previous *home* node rejoins. Also there are some with
> rc 1 (PCMK_OCF_UNKNOWN_ERROR):
> =====
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221
> (ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221
> (ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 194
> (ost0033-es02a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 194
> (ost0033-es02a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 6): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 195
> (ost0033-es03a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 195
> (ost0033-es03a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 196
> (ost0033-es04a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 196
> (ost0033-es04a_monitor_0) on es12kxe-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 246
> (ost0033-es04a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 246
> (ost0033-es04a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 247
> (ost0034-es01a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 247
> (ost0034-es01a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 189): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 245
> (ost0033-es03a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 1): Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 245
> (ost0033-es03a_monitor_0) on es7700-3-srv failed (target: 7 vs. rc:
> 1): Error
> =====
> rc=6 ones look absolutely incorrect.
> rc=1 one appears from a probe op finished with rc=7 (NOT_RUNNING).
> rc=189 ones seem to be expected.
>
> Hope this info has some value,
> Best regards,
> Vladislav
-- 
Ken Gaillot <kgaillot at redhat.com>