[Pacemaker] Mail notification for fencing action

imnotpc imnotpc at rock3d.net
Wed Jun 15 17:42:56 EDT 2011


On Wednesday, June 15, 2011 16:26:56 mark - pacemaker list wrote:
> On Wed, Jun 15, 2011 at 12:24 PM, imnotpc <imnotpc at rock3d.net> wrote:
> > What I was thinking is that the DC is never fenced
> 
> Is this actually the case?  It would sure explain the one "gotcha" I've
> never been able to work around in a three node cluster with stonith/SBD. 
> If you unplug the network cable from the DC (but it and the other nodes
> all still see the SBD disk via their other NIC(s)), the DC of course
> becomes completely isolated.  It will fence one of the still good nodes
> right away, and the surviving node that still has network connectivity
> will become DC. So, you have two DCs, the original one which is
> disconnected from the network and your newly elected one (not really
> elected, just took over because it's the last host left that has network).
>  When the just-fenced node comes back up, you get quorum with the new DC
> and your disconnected DC finally gets shot.
> 
> For any non-DC node, you get exactly the behavior you'd expect, where
> unplugging its network cable gets it fenced and everyone else stays happy.
>  I'd hoped for a situation where unplugging the DC would have the other two
> say, "well, our DC is gone, but we can see each other so he need to be
> fenced".  Maybe I've just missed a necessary timeout setting somewhere to
> delay the isolated DC from fencing a good node so quickly?
> 
> Sorry, I guess that's a thread hijack, but I've looked and googled and
> never anywhere been able to find something that says DCs don't get fenced,
> so this has confused me for a bit.
> 
> Regards,
> Mark

Oh I wasn't making a pronouncement that DCs are always up and unique. Dejan 
indicates they can fail and you've shown that there can be more than one. My 
point was that "as designed"/conceptually there should only be one and it 
should always be running. I think your example does in a way make my point. If 
every cluster had a unique notifying agent that was always running (or 
immediately restarted) and you suddenly had messages from multiple agents you 
would immediately know what had happened as opposed to wading though a flood of 
mail from each node or not getting anything at all. I chose the DC as an 
example of an agent that would meet these needs as opposed to resources which 
work poorly in this role.

Jeff




More information about the Pacemaker mailing list