[Pacemaker] ifdown ethX + corosync + DRBD = split-brain?
Viacheslav Dubrovskyi
dubrsl at gmail.com
Thu Jul 25 06:28:29 EDT 2013
19.07.2013 14:38, Howley, Tom wrote:
> Hi,
>
> I have been doing some testing of a fairly standard pacemaker/corosync setup with DRBD (with resource-level fencing) and have noticed the following in relation to testing network failures:
>
> - Handling of all ports being blocked is OK, based on hundreds of tests.
> - Handling of cable-pulls seems OK, based on only 10 tests.
> - ifdown ethX leads to split-brain roughly 50% of the time due to two underlying issues:
>
> 1. corosync (possibly by design) handles loss of network interface differently to other network failures. I can only see this from the point of view of the logs: "[TOTEM ] The network interface is down.", which is different from cable-pull log, where I don't see that message. I'm guessing this as I don't know the code.
> 2. corosync allows a non-quorate partition, in my case a single node, to update the CIB. This behaviour has been previously confirmed in reply to previous mails on this list and it has been mentioned that there may be improvements in this area in the future. This on its own seems like a bug to me.
>
> My question is: is it possible for me to configure corosync/drbd to handle the ifdown scenario or do I simply have to tell people "do not test with ifdown", as I have seen mentioned in a few places on the web? If I do have to leave out ifdown testing, how can I be sure that I haven't missed out testing some real network failure scenario.
When you shut down an interface, IP is removed. As a result, DRBD can
not bind to IP.
In real life, it's not going to happen. So just tell people "do not test
with ifdown".
--
WBR,
Viacheslav Dubrovskyi
More information about the Pacemaker
mailing list