[Pacemaker] split brain - after network recovery - resources can still be migrated

Sun Oct 26 16:32:51 UTC 2014

В Sun, 26 Oct 2014 12:01:03 +0100
Vladimir <ml at foomx.de> пишет:

> On Sat, 25 Oct 2014 19:11:02 -0400
> Digimer <lists at alteeve.ca> wrote:
> 
> > On 25/10/14 06:35 PM, Vladimir wrote:
> > > On Sat, 25 Oct 2014 17:30:07 -0400
> > > Digimer <lists at alteeve.ca> wrote:
> > >
> > >> On 25/10/14 05:09 PM, Vladimir wrote:
> > >>> Hi,
> > >>>
> > >>> currently I'm testing a 2 node setup using ubuntu trusty.
> > >>>
> > >>> # The scenario:
> > >>>
> > >>> All communication links betwenn the 2 nodes are cut off. This
> > >>> results in a split brain situation and both nodes take their
> > >>> resources online.
> > >>>
> > >>> When the communication links get back, I see following behaviour:
> > >>>
> > >>> On drbd level the split brain is detected and the device is
> > >>> disconnected on both nodes because of "after-sb-2pri disconnect"
> > >>> and then it goes to StandAlone ConnectionState.
> > >>>
> > >>> I'm wondering why pacemaker does not let the resources fail.
> > >>> It is still possible to migrate resources between the nodes
> > >>> although they're in StandAlone ConnectionState. After a split
> > >>> brain that's not what I want.
> > >>>
> > >>> Is this the expected behaviour? Is it possible to let the
> > >>> resources fail after the network recovery to avoid fürther data
> > >>> corruption.
> > >>>
> > >>> (At the moment I can't use resource or node level fencing in my
> > >>> setup.)
> > >>>
> > >>> Here the main part of my config:
> > >>>
> > >>> #> dpkg -l | awk '$2 ~ /^(pacem|coro|drbd|libqb)/{print $2,$3}'
> > >>> corosync 2.3.3-1ubuntu1
> > >>> drbd8-utils 2:8.4.4-1ubuntu1
> > >>> libqb-dev 0.16.0.real-1ubuntu3
> > >>> libqb0 0.16.0.real-1ubuntu3
> > >>> pacemaker 1.1.10+git20130802-1ubuntu2.1
> > >>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1
> > >>>
> > >>> # pacemaker
> > >>> primitive drbd-mysql ocf:linbit:drbd \
> > >>> params drbd_resource="mysql" \
> > >>> op monitor interval="29s" role="Master" \
> > >>> op monitor interval="30s" role="Slave"
> > >>>
> > >>> ms ms-drbd-mysql drbd-mysql \
> > >>> meta master-max="1" master-node-max="1" clone-max="2"
> > >>> clone-node-max="1" notify="true"
> > >>
> > >> Split-brains are prevented by using reliable fencing (aka stonith).
> > >> You configure stonith in pacemaker (using IPMI/iRMC/iLO/etc,
> > >> switched PDUs, etc). Then you configure DRBD to use the
> > >> crm-fence-peer.sh fence-handler and you set the fencing policy to
> > >> 'resource-and-stonith;'.
> > >>
> > >> This way, if all links fail, both nodes block and call a fence. The
> > >> faster one fences (powers off) the slower, and then it begins
> > >> recovery, assured that the peer is not doing the same.
> > >>
> > >> Without stonith/fencing, then there is no defined behaviour. You
> > >> will get split-brains and that is that. Consider; Both nodes lose
> > >> contact with it's peer. Without fencing, both must assume the peer
> > >> is dead and thus take over resources.
> > >
> > > That split brains can occur in such a setup that's clear. But I
> > > would expect pacemaker to stop the drbd resource when the link
> > > between the cluster nodes is reestablished instead of continue
> > > running it.
> > 
> > DRBD will refuse to reconnect until it is told which node's data to 
> > delete. This is data loss and can not be safely automated.
> 
> Sorry if maybe described it unclear but I don't want pacemaker to do an
> automatic split brain recovery. That would not make any sense to me
> either. This decission has to be taken by an administrator.
> 
> But is it possible to configure pacemaker to do the following?
>  
> - if there are 2 Nodes which can see and communicate with each other
>   AND
> - if their disk state is not UpToDate/UpToDate (typically after a split
>   brain)
> - then let drbd resource fail because something is obviously broken and
>   an administrator has to decide how to continue.
> 

This would require resource agent support. But it looks like current
resource agent relies on fencing to resolve split brain situation. As
long as resource agent itself does not indicate resource failure,
there is nothing pacemaker can do. 

> > >> This is why stonith is required in clusters. Even with quorum, you
> > >> can't assume anything about the state of the peer until it is
> > >> fenced, so it would only give you a false sense of security.
> > >
> > > Maybe I'll can use resource level fencing.
> > 
> > You need node-level fencing.
> 
> I know node level fencing is more secure. But shouldn't resource level
> fencing also work here? e.g.
> (http://www.drbd.org/users-guide/s-pacemaker-fencing.html) 
> 
> Currently I can't use ipmi, apc switches or a shared storage device
> for fencing at most fencing via ssh. But what I've read this is also not
> recommended for production setups.
> 

You could try meatware stonith agent. This does exactly what you want -
it freezes further processing unless administrator manually declares one
node as down.