[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

Zach Wolf ZWolf at doublepositive.com
Mon Nov 10 09:32:45 EST 2014


Hey Team,

I'm receiving some strange intermittent failovers on a two-node cluster (happens once every week or two). When this happens, both nodes are unavailable; one node will be marked offline and the other will be shown as unclean. Any help on this would be massively appreciated. Thanks.

Running Ubuntu 12.04 (64-bit)
Pacemaker 1.1.6-2ubuntu3.3
Corosync 1.4.2-2ubuntu0.2

Here are the logs:
Nov 08 14:26:26 corosync [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x12bebe0, async-conn=0x12bebe0) left
Nov 08 14:26:26 corosync [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:27 corosync [pcmk  ] info: pcmk_ipc_exit: Client attrd (conn=0x12d0230, async-conn=0x12d0230) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client cib (conn=0x12c7d80, async-conn=0x12c7d80) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x12c3a20, async-conn=0x12c3a20) left
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12bebe0 for stonith-ng/0
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c2f40 for attrd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c72a0 for cib/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 to cib
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12cb600 for crmd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 to crmd

Output of crm configure show:
node p-sbc3 \
        attributes standby="off"
node p-sbc4 \
        attributes standby="off"
primitive fs lsb:FSSofia \
        op monitor interval="2s" enabled="true" timeout="10s" on-fail="standby" \
        meta target-role="Started"
primitive fs-ip ocf:heartbeat:IPaddr2 \
        params ip="10.100.0.90" nic="eth0:0" cidr_netmask="24" \
        op monitor interval="10s"
primitive fs-ip2 ocf:heartbeat:IPaddr2 \
        params ip="10.100.0.99" nic="eth0:1" cidr_netmask="24" \
        op monitor interval="10s"
group cluster_services fs-ip fs-ip2 fs \
        meta target-role="Started"
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        last-lrm-refresh="1348755080" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141110/5abec57f/attachment-0002.html>


More information about the Pacemaker mailing list