[Pacemaker] Mail notification for fencing action

Wed Jun 15 17:42:56 EDT 2011

On Wednesday, June 15, 2011 16:26:56 mark - pacemaker list wrote:
> On Wed, Jun 15, 2011 at 12:24 PM, imnotpc <imnotpc at rock3d.net> wrote:
> > What I was thinking is that the DC is never fenced
> 
> Is this actually the case?  It would sure explain the one "gotcha" I've
> never been able to work around in a three node cluster with stonith/SBD. 
> If you unplug the network cable from the DC (but it and the other nodes
> all still see the SBD disk via their other NIC(s)), the DC of course
> becomes completely isolated.  It will fence one of the still good nodes
> right away, and the surviving node that still has network connectivity
> will become DC. So, you have two DCs, the original one which is
> disconnected from the network and your newly elected one (not really
> elected, just took over because it's the last host left that has network).
>  When the just-fenced node comes back up, you get quorum with the new DC
> and your disconnected DC finally gets shot.
> 
> For any non-DC node, you get exactly the behavior you'd expect, where
> unplugging its network cable gets it fenced and everyone else stays happy.
>  I'd hoped for a situation where unplugging the DC would have the other two
> say, "well, our DC is gone, but we can see each other so he need to be
> fenced".  Maybe I've just missed a necessary timeout setting somewhere to
> delay the isolated DC from fencing a good node so quickly?
> 
> Sorry, I guess that's a thread hijack, but I've looked and googled and
> never anywhere been able to find something that says DCs don't get fenced,
> so this has confused me for a bit.
> 
> Regards,
> Mark

Oh I wasn't making a pronouncement that DCs are always up and unique. Dejan 
indicates they can fail and you've shown that there can be more than one. My 
point was that "as designed"/conceptually there should only be one and it 
should always be running. I think your example does in a way make my point. If 
every cluster had a unique notifying agent that was always running (or 
immediately restarted) and you suddenly had messages from multiple agents you 
would immediately know what had happened as opposed to wading though a flood of 
mail from each node or not getting anything at all. I chose the DC as an 
example of an agent that would meet these needs as opposed to resources which 
work poorly in this role.

Jeff