<div dir="ltr"><div><div><div><div><div><div>Hi,<br><br></div>I have posted a question about this error attached to another thread, but because it was old and there is no answer I thought it could have been missed, so I am sorry for repeating it.<br><br></div>Regarding the problem.<br></div>I have a cluster, and when the cluster gets bigger (around 40 remote nodes) some remote nodes go offline after a while and their logs report some message errors, there is no indication about anything wrong in the other logs.<br><br></div>Details:<br></div>- 40 ec2 m3.xlarge nodes, 1 corosync ring member, 39 remote<br></div><div>- maybe irrelevant, but either "cib" or "pengine" process goes to ~100% CPU<br></div><div>- it does not happen immediately<br></div><div>- smaller cluster (~20 remote nodes) does not show any problems<br></div><div>- pacemaker: 1.1.15-1.1f8e642.git.el6.x86_64<br></div><div>- corosync: 2.4.1-1.2.0da1.el6.x86_64<br>- libqb-1.0.0-1.28.4dff.el6.x86_64<br></div><div>- CentOS 6<br></div><div><br></div><div>Logs:<br><br>[...]<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:
error: crm_abort: crm_remote_header: Triggered assert at
remote.c:119 : endian == ENDIAN_LOCAL<br>Sep 27 17:18:31 [19626]
ip-10-237-223-67 pacemaker_remoted: error: crm_remote_header:
Invalid message detected, endian mismatch: badadbbd is neither 63646330
nor the swab'd 30636463<br>Sep 27 17:18:31 [19626] ip-10-237-223-67
pacemaker_remoted: error: crm_abort: crm_remote_header:
Triggered assert at remote.c:119 : endian == ENDIAN_LOCAL<br>Sep 27
17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error:
crm_remote_header: Invalid message detected, endian mismatch:
badadbbd is neither 63646330 nor the swab'd 30636463<br>Sep 27 17:18:31
[19626] ip-10-237-223-67 pacemaker_remoted: error: crm_abort:
crm_remote_header: Triggered assert at remote.c:119 : endian ==
ENDIAN_LOCAL<br>Sep 27 17:18:31 [19626] ip-10-237-223-67
pacemaker_remoted: error: crm_remote_header: Invalid message
detected, endian mismatch: badadbbd is neither 63646330 nor the swab'd
30636463<br>Sep 27 17:18:31 [19626] ip-10-237-223-67
pacemaker_remoted: info: lrmd_remote_client_msg: Client disconnect
detected in tls msg dispatcher.<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: info: ipc_proxy_remove_provider: <wbr> ipc proxy connection for client ca8df213-6da7-4c42-8cb3-<wbr>b8bc0887f2ce pid 21815 destroyed because cluster node disconnected.<br>Sep
27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: info:
cancel_recurring_action: Cancelling ocf operation
monitor_all_monitor_191000<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_send_tls: Connection terminated rc = -53<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_send_tls: Connection terminated rc = -10<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_remote_send: Failed to send remote msg, rc = -10<br>Sep
27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error:
lrmd_tls_send_msg: Failed to send remote lrmd tls msg, rc = -10<br>Sep
27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: warning:
send_client_notify: Notification of client
remote-lrmd-ip-10-237-223-67:<wbr>3121/b6034d3a-e296-492f-b296-<wbr>725735d17e22 failed<br>Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: notice: lrmd_remote_client_destroy: <wbr> LRMD client disconnecting remote client - name: remote-lrmd-ip-10-237-223-67:<wbr>3121 id: b6034d3a-e296-492f-b296-<wbr>725735d17e22<br>Sep
27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted: error:
ipc_proxy_accept: No ipc providers available for uid 0 gid 0<br>Sep 27
17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted: error:
handle_new_connection: Error in connection setup (19626-21815-14):
Remote I/O error (121)<br>Sep 27 17:19:50 [19626] ip-10-237-223-67
pacemaker_remoted: error: ipc_proxy_accept: No ipc providers
available for uid 0 gid 0<br>Sep 27 17:19:50 [19626] ip-10-237-223-67
pacemaker_remoted: error: handle_new_connection: Error in
connection setup (19626-21815-14): Remote I/O error (121)<br>[...]<br><br><br></div><div><div><div><div><div><div><div><div><br>-- <br><div class="gmail_signature"><div dir="ltr"><div>Best Regards,<br><br>Radoslaw Garbacz<br></div>XtremeData Incorporation<br></div></div>
</div></div></div></div></div></div></div></div></div>