[Pacemaker] 1.1.12: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)

Cédric Dufour - Idiap Research Institute cedric.dufour at idiap.ch
Fri Aug 1 05:06:04 EDT 2014


SOLVED!

[more below]

On 01/08/14 06:30, Andrew Beekhof wrote:
> On 1 Aug 2014, at 2:04 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> On 1 Aug 2014, at 7:47 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>> On 31 Jul 2014, at 4:46 pm, Cédric Dufour - Idiap Research Institute <cedric.dufour at idiap.ch> wrote:
>>>
>>>> On 31/07/14 00:17, Andrew Beekhof wrote:
>>>>> On 31 Jul 2014, at 2:48 am, Cédric Dufour - Idiap Research Institute <cedric.dufour at idiap.ch> wrote:
>>>>>
>>>>>> After packaging pacemaker 1.1.12 for Debian/Wheezy (along corosync 1.4.6 and libqb 0.17.0), I have successfully initialized a new cluster.
>>>>>>
>>>>>> Back to a very simple test cluster, the only problem I have is with fencing, which fails altogether with "route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)" messages:
>>>>>>
>>>>>> root at bc1hs22a01:~ # tail /var/log/corosync.rsyslog
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonith_admin[5411]:   notice: crm_log_args: Invoked: stonith_admin -F bc1hs22a02
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]:   notice: handle_request: Client stonith_admin.5411.fe1388ed wants to fence (off) 'bc1hs22a02' with device '(any)'
>>>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]:   notice: initiate_remote_stonith_op: Initiating remote operation off for bc1hs22a02: 48b69f82-29ad-4c9a-af57-0e60ae5242e4 (0)
>>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>>>> rc=-2 is coming from send_client_ipc(void *conn, const AIS_Message * ais_msg)
>>>>>
>>>>> specifically:
>>>>>
>>>>>  if (conn == NULL) {
>>>>>      rc = -2;
>>>>>
>>>>> So the plugin thinks that stonith-ng isn't connected.
>>>>> More logs?
>>>>>
>>>> I have completed a full restart of the cluster in order to provide the logs at each step; see attached log files:
>>>> (from node_1/DC)
>>>> - node_1-corosync-start.log
>>>> - node_1-pacemaker-start.log
>>>> - node_1-corosync-node_2_join.log
>>>> - node_1-pacemaker-node_2_join.log
>>>> (from node_2)
>>>> - node_2-corosync-start.log
>>>> - node_2-pacemaker-start.log
>>>>
>>>> The problem manifests itself already in DC start log - because of previous fencing attempt - at 08:19:21 and 08:19:42:
>>>>
>>>> root at bc1hs22a01:~ # fgrep 'ipc delivery failed' node_1-corosync-start.log
>>>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>>>
>>>> While it would seem (to me) that the stonith plugin successfully connected to the CIB:
>>> Its not the CIB thats the issue:
>>>
>>>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>> Thats the pacemaker plugin inside corosync (which uses a completely different IPC mechanism).
>> It looks like there is a name mismatch:
>>
>> Jul 31 08:19:20 bc1hs22a01 corosync[31057]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x2543e30 for stonithd/0
>> Jul 31 08:19:20 bc1hs22a01 corosync[31057]:   [pcmk  ] debug: process_ais_message: Msg[1] (dest=local:ais, from=bc1hs22a01:stonithd.31092, remote=true, size=6): 31092
>> ...
>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
>>
>> Could you try the following patch?
> Actually, try this one instead:
>    https://github.com/beekhof/pacemaker/commit/21830a0

This one-line patch did it:

Aug  1 09:48:26 bc1hs22a01 corosync[15681]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1a926c0 for stonith-ng/0
Aug  1 09:48:26 bc1hs22a01 corosync[15681]:   [pcmk  ] info: pcmk_ipc: Sending membership update 120 to stonith-ng

And (previously attempted/recorded) fencing command worked as soon as the DC started.

Thank you very much for your quick response!
(I can now enjoy Switzerland National Day with total peace of mind :-) )

PS: I'll carry out further cluster/fencing tests nest week (should you want a thorougher confirmation before pushing your patch to master)

>>> FWIW, the plugin is extremely deprecated, you're encouraged to use pacemaker+cman or begin working towards corosync2 + pacemakerd.
>>>
>>>

I'll keep this in mind (but not so easy to achieve when one is willing to not stray too far from Debian "stable").

Best and thanks again,

Cédric






More information about the Pacemaker mailing list