[Pacemaker] Peers see each other, but never successfully elect a DC.

Andrew Beekhof andrew at beekhof.net
Fri Jan 29 03:12:28 EST 2010


On Fri, Jan 29, 2010 at 6:12 AM, D. J. Draper <draperd7772 at hotmail.com> wrote:
> Hi guys and gals. First time posting here, and I've either got a really
> simple issue or a whopper of a problem, as extensive Googling failed to
> return any other instances of anyone encountering this problem.
>
> Everything I've read about using Pacemaker over the OpenAIS stack pretty
> much states you just write a valid pair of corosync.conf files (openais
> parser is apparently broken right now), update /etc/init.d/openais to use
> the experimental corosync parser, fire up the service, and two nodes should
> form a cluster.
>
> Well, I do the above, successfully fire up the openais service on both
> nodes, then tail /var/log/messages. Both nodes report successfully
> connecting to the CIB, log seeing each other, and even send each other join
> invitations. But their counterparts never acknowledge the invitation nor do
> they elect a DC:

These look like a very bad sign:

Jan 28 22:58:19 node01 crmd: [19336]: ERROR: check_message_sanity:
Invalid message 0: (dest=<all>:unknown, from=<all>:unknown.0,
compressed=0, size=0, total=0)
Jan 28 22:58:19 node01 crmd: [19336]: ERROR: ais_dispatch: Invalid
message (id=0, dest=<all>:unknown, from=<all>:unknown.0): min=592,
total=0, size=0, bz2_size=0
Jan 28 22:58:19 node01 crmd: [19336]: WARN: check_message_sanity:
Message with no size
Jan 28 22:58:19 node01 crmd: [19336]: ERROR: check_message_sanity:
Invalid message 0: (dest=<all>:unknown, from=<all>:unknown.0,
compressed=0, size=0, total=0)
Jan 28 22:58:19 node01 crmd: [19336]: ERROR: ais_dispatch: Invalid
message (id=0, dest=<all>:unknown, from=<all>:unknown.0): min=592,
total=0, size=0, bz2_size=0

(BTW. Please send logs as attachments)

There would appear to be a problem with Corosync's IPC mechanism.

To try and narrow down the problem...

1 Could you try running "corosync-objctl" after you get errors from pacemaker.

2. If that doesn't work, could you try removing pacemaker from
corosync.conf and restart corosync (so just corosync is running) and
then run corosync-objctl




More information about the Pacemaker mailing list