[Pacemaker] Call cib_query failed (-41): Remote node did not respond

Brian J. Murrell brian at interlinx.bc.ca
Wed Jul 4 12:51:41 EDT 2012


On 12-07-04 04:27 AM, Andreas Kurz wrote:
> 
> beside increasing the batch limit to a higher value ... did you also
> tune corosync totem timings?

Not yet.

But a closer look at the logs reveals a bunch of these:

Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25046 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-4.lab.example.com"
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-4.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25048 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-6.lab.example.com"
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-6.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 abrt[25049]: not dumping repeating crash in '/usr/sbin/corosync'
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25050 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-10.lab.example.com
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-10.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25051 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-7.lab.example.com"
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-7.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25052 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-4.lab.example.com"
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-4.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25053 spawned to record non-fatal assertion failure line 1594: rc == 0
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Message not sent (-1): <copy t="cib" cib_op="cib_replace" cib_delegated_from="node-6.lab.example.com"
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] WARN: route_ais_message: Sending message to node-6.lab.example.com.cib failed: cluster delivery failed (rc=-1)
Jun 28 14:56:56 node-2 corosync[30497]:   [pcmk  ] ERROR: send_cluster_msg_raw: Child 25054 spawned to record non-fatal assertion failure line 1594: rc == 0

Google could not seem to turn up anything about the assertion message.

I also saw these after setting the batch-limit to 1 and repeating my 8
node (4 active, 4 idle) experiment today.

But surely, it is easy to understand why pacemaker would have problems
if corosync is aborting on a failed assertion.

Any clues what this one is about?  This is corosync-1.4.1-4.el6_2.3.x86_64.

Cheers,
b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120704/14e810ca/attachment-0003.sig>


More information about the Pacemaker mailing list