[ClusterLabs] pacemaker_remoted XML parse error

Radoslaw Garbacz radoslaw.garbacz at xtremedatainc.com
Tue Sep 27 15:36:53 EDT 2016


Hi,

I encountered the same problem with pacemaker built from github at around
August 22.

Remote nodes go offline occasionally and stay so, their logs show same
errors. The cluster is on AWS ec2 instances, the network works and is an
unlikely reason.

Have there be any commits on github recently (after August 22) addressing
this issue?


Logs:
[...]
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
endian == ENDIAN_LOCAL
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_remote_header:        Invalid message detected, endian mismatch:
badadbbd is neither 63646330 nor the swab'd 30636463
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
endian == ENDIAN_LOCAL
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_remote_header:        Invalid message detected, endian mismatch:
badadbbd is neither 63646330 nor the swab'd 30636463
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_abort:        crm_remote_header: Triggered assert at remote.c:119 :
endian == ENDIAN_LOCAL
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_remote_header:        Invalid message detected, endian mismatch:
badadbbd is neither 63646330 nor the swab'd 30636463
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
lrmd_remote_client_msg:   Client disconnect detected in tls msg dispatcher.
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
ipc_proxy_remove_provider:        ipc proxy connection for client
ca8df213-6da7-4c42-8cb3-b8bc0887f2ce pid 21815 destroyed because cluster
node disconnected.
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:     info:
cancel_recurring_action:  Cancelling ocf operation
monitor_all_monitor_191000
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_send_tls:     Connection terminated rc = -53
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_send_tls:     Connection terminated rc = -10
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
crm_remote_send:  Failed to send remote msg, rc = -10
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
lrmd_tls_send_msg:        Failed to send remote lrmd tls msg, rc = -10
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:  warning:
send_client_notify:       Notification of client
remote-lrmd-ip-10-237-223-67:3121/b6034d3a-e296-492f-b296-725735d17e22
failed
Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted:   notice:
lrmd_remote_client_destroy:       LRMD client disconnecting remote client -
name: remote-lrmd-ip-10-237-223-67:3121 id:
b6034d3a-e296-492f-b296-725735d17e22
Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
ipc_proxy_accept: No ipc providers available for uid 0 gid 0
Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
handle_new_connection:    Error in connection setup (19626-21815-14):
Remote I/O error (121)
Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
ipc_proxy_accept: No ipc providers available for uid 0 gid 0
Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted:    error:
handle_new_connection:    Error in connection setup (19626-21815-14):
Remote I/O error (121)
[...]




On Thu, Jun 9, 2016 at 12:24 AM, Narayanamoorthy Srinivasan <
narayanamoorthys at gmail.com> wrote:

> Don't see any issues in network traffic.
>
> Some more logs where the XML tags are incomplete:
>
> 2016-06-09T03:06:03.096449+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="fs-postgresql_last_0" operation_key="fs-postgresql_stop_0"
> operation="stop" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="225:116:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;225:116:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="489" rc-code="0" op-status="0" interval="0" last-run="1459491026"
> last-rc-change="1459491026" exec-time="158" queue-time="0"
> op-digest="dfb0c861
> 2016-06-09T03:06:03.097136+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="fs-postgresql_last_failure_0"
> operation_key="fs-postgresql_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
> transition-key="41:4:7:8fbf83fd-241b-4623-8bbe-31d92e4dfce1"
> transition-magic="0:0;41:4:7:8fbf83fd-241b-4623-8bbe-31d92e4dfce1"
> on_node="d00-50-56-94-24-dd" call-id="5" rc-code="0" op-status="0"
> interval="0" last-run="1459429072" last-rc-change="1459429072"
> exec-time="315" queue-time="0" op-digest="df
> 2016-06-09T03:06:03.097361+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="fs-postgresql_monitor_10000" operation_key="fs-postgresql_monitor_10000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="224:107:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;224:107:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="365" rc-code="0" op-status="0" interval="10000"
> last-rc-change="1459490849" exec-time="185" queue-time="0"
> op-digest="cd8d3642c
> 2016-06-09T03:06:03.097582+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> </lrm_resource>
> 2016-06-09T03:06:03.097690+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_resource id="vip-admin-database-default-proposal-controller"
> type="IPaddr2" class="ocf" provider="heartbeat">
> 2016-06-09T03:06:03.097797+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="vip-admin-database-default-proposal-controller_last_0"
> operation_key="vip-admin-database-default-proposal-controller_stop_0"
> operation="stop" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="228:116:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;228:116:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="487" rc-code="0" op-status="0" interval="0" last-run="1459491026"
> last-rc-chan
> 2016-06-09T03:06:03.098013+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="vip-admin-database-default-proposal-controller_monitor_10000"
> operation_key="vip-admin-database-default-proposal-controller_monitor_10000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="227:107:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;227:107:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="369" rc-code="0" op-status="0" interval="10000" last-rc-chang
> 2016-06-09T03:06:03.098230+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> </lrm_resource>
> 2016-06-09T03:06:03.098337+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_resource id="postgresql" type="pgsql" class="ocf" provider="heartbeat">
> 2016-06-09T03:06:03.098468+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="postgresql_last_0" operation_key="postgresql_stop_0"
> operation="stop" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="231:116:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;231:116:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="481" rc-code="0" op-status="0" interval="0" last-run="1459491025"
> last-rc-change="1459491025" exec-time="1334" queue-time="0"
> op-digest="f2317cad3d54c
> 2016-06-09T03:06:03.099061+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6153]:    error: Partial
> <lrm_rsc_op id="postgresql_monitor_10000" operation_key="postgresql_monitor_10000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10" transition-key="230:107:0:
> 8fbf83fd-241b-4623-8bbe-31d92e4dfce1" transition-magic="0:0;230:107:
> 0:8fbf83fd-241b-4623-8bbe-31d92e4dfce1" on_node="d00-50-56-94-24-dd"
> call-id="372" rc-code="0" op-status="0" interval="10000"
> last-rc-change="1459490852" exec-time="424" queue-time="0"
> op-digest="873ed4f07792aa8
>
>
> On Wed, Jun 8, 2016 at 10:28 PM, Narayanamoorthy Srinivasan <
> narayanamoorthys at gmail.com> wrote:
>
>> No recent network changes. Will check for abnormal traffic using
>> wireshark.
>>
>> I also notice that the XML lines are partial (no ending '>', closing "
>> and sometimes partial words) in logs. Any lines > 472 characters are
>> truncated to 472 characters. Wondering is it due to anyother limitations.
>>
>> I can post some line tomorrow when i am back to work.
>>
>>
>> On Wed, Jun 8, 2016 at 8:00 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>>> On 06/08/2016 06:14 AM, Narayanamoorthy Srinivasan wrote:
>>> > I have a pacemaker cluster with two pacemaker remote nodes. Recently
>>> the
>>> > remote nodes started throwing below errors and SDB started
>>> self-fencing.
>>> > Appreciate if someone throws light on what could be the issue and the
>>> fix.
>>> >
>>> > OS - SLES 12 SP1
>>> > Pacemaker Remote version - pacemaker-remote-1.1.13-14.7.x86_64
>>> >
>>> > 2016-06-08T14:11:46.009073+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error: Entity: line 1: parser
>>> > error : AttValue: ' expected
>>> > 2016-06-08T14:11:46.009314+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> > key="neutron-ha-tool_monitor_0" operation="monitor"
>>> > crm-debug-origin="do_update_
>>> > 2016-06-08T14:11:46.009443+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> >                                                      ^
>>> > 2016-06-08T14:11:46.009567+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error: Entity: line 1: parser
>>> > error : attributes construct error
>>> > 2016-06-08T14:11:46.009697+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> > key="neutron-ha-tool_monitor_0" operation="monitor"
>>> > crm-debug-origin="do_update_
>>> > 2016-06-08T14:11:46.009824+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> >                                                      ^
>>> > 2016-06-08T14:11:46.009948+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error: Entity: line 1: parser
>>> > error : Couldn't find end of Start Tag lrm_rsc_op line 1
>>> > 2016-06-08T14:11:46.010070+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> > key="neutron-ha-tool_monitor_0" operation="monitor"
>>> > crm-debug-origin="do_update_
>>> > 2016-06-08T14:11:46.010191+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> >                                                      ^
>>> > 2016-06-08T14:11:46.010460+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error: Entity: line 1: parser
>>> > error : Premature end of data in tag lrm_resource line 1
>>> > 2016-06-08T14:11:46.010718+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> > key="neutron-ha-tool_monitor_0" operation="monitor"
>>> > crm-debug-origin="do_update_
>>> > 2016-06-08T14:11:46.010977+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error:
>>> >                                                      ^
>>> > 2016-06-08T14:11:46.011234+05:30 d18-fb-7b-18-f1-8e
>>> > pacemaker_remoted[6190]:    error: XML Error: Entity: line 1: parser
>>> > error : Premature end of data in tag lrm_resources line 1
>>> >
>>> >
>>> > --
>>> > Thanks & Regards
>>> > Moorthy
>>>
>>> This sounds like the network traffic between the cluster nodes and the
>>> remote nodes is being corrupted. Have there been any network changes
>>> lately? Switch/firewall/etc. equipment/settings? MTU?
>>>
>>> You could try using a packet sniffer such as wireshark to see if the
>>> traffic looks abnormal in some way. The payload is XML so it should be
>>> more or less readable.
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> Thanks & Regards
>> Moorthy
>>
>
>
>
> --
> Thanks & Regards
> Moorthy
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160927/2fa32b10/attachment-0002.html>


More information about the Users mailing list