[ClusterLabs] Pacemaker remote - invalid message detected, endian mismatch

Jan Pokorný jpokorny at redhat.com
Fri Sep 30 17:53:51 EDT 2016


On 30/09/16 11:28 -0500, Radoslaw Garbacz wrote:
> I have posted a question about this error attached to another thread, but
> because it was old and there is no answer I thought it could have been
> missed, so I am sorry for repeating it.
> 
> Regarding the problem.
> I have a cluster, and when the cluster gets bigger (around 40 remote nodes)
> some remote nodes go offline after a while and their logs report some
> message errors, there is no indication about anything wrong in the other
> logs.

I believe I would have a plausible explanation provided it may happen
(not sure now, perhaps the ipc proxy setup would allow it) that two
messages via the same connection are transmitted, with the second one
being read as part of the first one.

Could you please try running pacemaker_remoted with
"PCMK_trace_files=remote.c" in the respective "sysconfig" file?

> Details:
> - 40 ec2 m3.xlarge nodes, 1 corosync ring member, 39 remote
> - maybe irrelevant, but either "cib" or "pengine" process goes to ~100% CPU
> - it does not happen immediately
> - smaller cluster (~20 remote nodes) does not show any problems
> - pacemaker: 1.1.15-1.1f8e642.git.el6.x86_64
> - corosync: 2.4.1-1.2.0da1.el6.x86_64
> - libqb-1.0.0-1.28.4dff.el6.x86_64
> - CentOS 6
> 
> Logs:
> 
> [...]
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
> endian == ENDIAN_LOCAL
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_remote_header:        Invalid message detected, endian mismatch:
> badadbbd is neither 63646330 nor the swab'd 30636463
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
> endian == ENDIAN_LOCAL
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_remote_header:        Invalid message detected, endian mismatch:
> badadbbd is neither 63646330 nor the swab'd 30636463
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
> endian == ENDIAN_LOCAL
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_remote_header:        Invalid message detected, endian mismatch:
> badadbbd is neither 63646330 nor the swab'd 30636463
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
> lrmd_remote_client_msg:   Client disconnect detected in tls msg dispatcher.
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
> ipc_proxy_remove_provider:        ipc proxy connection for client
> ca8df213-6da7-4c42-8cb3-b8bc0887f2ce pid 21815 destroyed because cluster
> node disconnected.
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
> cancel_recurring_action:  Cancelling ocf operation
> monitor_all_monitor_191000
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_send_tls:     Connection terminated rc = -53
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_send_tls:     Connection terminated rc = -10
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> crm_remote_send:  Failed to send remote msg, rc = -10
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> lrmd_tls_send_msg:        Failed to send remote lrmd tls msg, rc = -10
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:  warning:
> send_client_notify:       Notification of client
> remote-lrmd-ip-10-237-223-67:3121/b6034d3a-e296-492f-b296-725735d17e22
> failed
> Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:   notice:
> lrmd_remote_client_destroy:       LRMD client disconnecting remote client -
> name: remote-lrmd-ip-10-237-223-67:3121 id: b6034d3a-e296-492f-b296-
> 725735d17e22
> Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> ipc_proxy_accept: No ipc providers available for uid 0 gid 0
> Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> handle_new_connection:    Error in connection setup (19626-21815-14):
> Remote I/O error (121)
> Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> ipc_proxy_accept: No ipc providers available for uid 0 gid 0
> Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
> handle_new_connection:    Error in connection setup (19626-21815-14):
> Remote I/O error (121)
> [...]

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160930/fdb5de18/attachment-0003.sig>


More information about the Users mailing list