[ClusterLabs] Failure of preferred node in a 2 node cluster
Wei Shan
weishan.ang at gmail.com
Mon Apr 30 11:26:00 EDT 2018
Hi!
Thanks for the prompt response.
Due to the RedHat version, I can't use shared storage. I'm using
timer-based watchdog with stonith-watchdog-timeout set to 5 seconds.
On Sun, 29 Apr 2018 at 05:37, Digimer <lists at alteeve.ca> wrote:
> Ah, ok, now I get it.
>
> So node 2 should wait until it's confident that the lost node either
> shut down or was killed by it's watchdog timer. After that, it will
> consider the node fenced and proceed with recovery. I don't think ATB
> will factor in here, as the cluster should treat this as a simple "node
> was lost, fencing finally worked, it's safe to recover now" thing.
>
> The node IDs shouldn't matter in this case. What decides the winner is
> who is allowed access to the shared storage. The one that can is allowed
> to keep kicking the watchdog. The one that loses access, assuming it is
> alive at all, should be forced off when the watchdog timer expires.
>
> digimer
>
>
> On 2018-04-28 09:19 PM, Wei Shan wrote:
> > Hi,
> >
> > I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I
> > understand this is a really bad setup but this is what the end-user
> wants.
> >
> > ATB => auto_tie_breaker
> >
> > "When the auto_tie_breaker is used in even-number member clusters, then
> > the failure of the partition containing the auto_tie_breaker_node (by
> > default the node with lowest ID) will cause other partition to become
> > inquorate and it will self-fence. In 2-node clusters with
> > auto_tie_breaker this means that failure of node favoured by
> > auto_tie_breaker_node (typically nodeid 1) will result in reboot of
> > other node (typically nodeid 2) that detects the inquorate state. If
> > this is undesirable then corosync-qdevice can be used instead of the
> > auto_tie_breaker to provide additional vote to quorum making behaviour
> > closer to odd-number member clusters."
> >
> > Thanks
> >
> >
> > On Sun, 29 Apr 2018 at 02:15, Digimer <lists at alteeve.ca
> > <mailto:lists at alteeve.ca>> wrote:
> >
> > On 2018-04-28 09:06 PM, Wei Shan wrote:
> > > Hi all,
> > >
> > > If I have a 2 node cluster with ATB enabled and the lowest node ID
> > node
> > > has failed. What will happen? My assumption is that the higher
> node ID
> > > node will self fence and be rebooted. What happens after that?
> > >
> > > Thanks!
> > >
> > > --
> > > Regards,
> > > Ang Wei Shan
> >
> > Which cluster stack is this? I am not familiar with the term "ATB".
> >
> > If it's a standard pacemaker or cman/rgmanager cluster, then on node
> > failure, the good node should block and request a fence (a lost node
> is
> > not allowed to be assumed gone via self fence, except when using a
> > watchdog timer based fence agent). If the fence doesn't work, the
> > survivor should remain blocked (better to hang than risk
> corruption). If
> > the fence succeeds, then the survivor node will recover any lost
> > services based on the configuration of those services (usually a
> simple
> > (re)start on the good node).
> >
> > --
> > Digimer
> > Papers and Projects: https://alteeve.com/w/
> > "I am, somehow, less interested in the weight and convolutions of
> > Einstein’s brain than in the near certainty that people of equal
> talent
> > have lived and died in cotton fields and sweatshops." - Stephen Jay
> > Gould
> >
> >
> >
> > --
> > Regards,
> > Ang Wei Shan
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
--
Regards,
Ang Wei Shan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180430/4245413e/attachment-0002.html>
More information about the Users
mailing list