[Pacemaker] node offline after fencing (pacemakerd hangs)
Jake Smith
jsmith at argotec.com
Thu Jul 19 10:05:15 EDT 2012
----- Original Message -----
> From: "Raoul Bhatia [IPAX]" <r.bhatia at ipax.at>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Wednesday, July 18, 2012 10:12:14 AM
> Subject: Re: [Pacemaker] node offline after fencing (pacemakerd hangs)
>
> On 2012-07-18 15:57, Ulrich Leodolter wrote:
> > hi,
> >
> > after adding a second ring to corosync.conf
> > the problem seems to be gone.
> >
> > after killing corosync the node is fenced by
> > the other node. after reboot the cluster is
> > fully operational.
> >
> > is this essential to have at least 2 rings?
> >
> > maybe there is a network timing problem (but can't see
> > error messages)
> > the interface on ring 0 (192.168.20.171) is a bridge.
> > the interface on ring 1 (10.10.10.171) is normal ethernet
> > interface.
>
> I've seen such things with bonding devices under debian 6.0
>
> try something like:
> > auto bond0
> > iface bond0 inet static
> ...
> >
> bond-mode active-backup
> bond-miimon 100
> bridge_fd 0
> bridge_maxwait 0
>
> Another workaround is a "sleep 10" or similar at the beginning
> of the pacemaker script to let bond0 come up.
Same here under Ubuntu - more specifically with OCFS2/dlm under Pacemaker and autostarting on boot - however same sort of problems.
Another solution is something like (will vary a little in RHEL I believe):
Disable corosync autostart
$sudo update-rc.d -f corosync disable S
add 'post-up /etc/init.d/corosync start' to bonding (or in your case bridged) interface in
/etc/network/interfaces.
^^^^ From:
http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=63617;page=1;sb=post_latest_reply;so=ASC;mh=25;list=linuxha
That way corosync wont start until the interfaces/bridge are actually up.
>
> We always go with 2 rings, even when using a NIC bonding.
+1
We use 2 rings, each on a different bond.
HTH
Jake
More information about the Pacemaker
mailing list