[ClusterLabs] Recovering after split-brain
kgaillot at redhat.com
Mon Jun 20 16:54:35 EDT 2016
On 06/20/2016 08:30 AM, Nikhil Utane wrote:
> For our solution we are making a conscious choice to not use
> quorum/fencing as for us service availability is more important than
> having 2 nodes take up the same active role. Split-brain is not an issue
> for us (at least i think that way) since we have a second line of
> defense. We have clients who can connect to only one of the two active
> nodes. So in that sense, even if we end up with 2 nodes becoming active,
> since the clients can connect to only 1 of the active node, we should
> not have any issue.
> Now my question is what happens after recovering from split-brain since
> the resource will be active on both the nodes. From application point of
> view we want to be able to find out which node is servicing the clients
> and keep that operational and make the other one as standby.
> Does Pacemaker make it easy to do this kind of thing through some means?
> Are there any issues that I am completely unaware due to letting
> split-brain occur?
Usually, split brain is most destructive when the two nodes need to
synchronize data in some way (DRBD, shared storage, cluster file
systems, replication, etc.). If both nodes attempt to write without
being able to coordinate with each other, it usually results in
incompatible data stores that cause big recovery headaches (sometimes
throw-it-away-and-restore-from-backup headaches). For a resource such as
a floating IP, the consequences are less severe, but it can result in
the service becoming unusable (if both nodes claim the IP, packets go
every which way).
In the scenario you describe, if a split brain occurs and then is
resolved, Pacemaker will likely stop the services on both nodes, then
start them on one node.
The main questions I see are (1) does your service require any sort of
coordination/synchronization between the two nodes, especially of data;
and (2) how do clients know which node to connect to?
More information about the Users