[ClusterLabs] Failure of preferred node in a 2 node cluster

Mon Apr 30 11:26:00 EDT 2018

Hi!

Thanks for the prompt response.

Due to the RedHat version, I can't use shared storage. I'm using
timer-based watchdog with stonith-watchdog-timeout set to 5 seconds.

On Sun, 29 Apr 2018 at 05:37, Digimer <lists at alteeve.ca> wrote:

> Ah, ok, now I get it.
>
> So node 2 should wait until it's confident that the lost node either
> shut down or was killed by it's watchdog timer. After that, it will
> consider the node fenced and proceed with recovery. I don't think ATB
> will factor in here, as the cluster should treat this as a simple "node
> was lost, fencing finally worked, it's safe to recover now" thing.
>
> The node IDs shouldn't matter in this case. What decides the winner is
> who is allowed access to the shared storage. The one that can is allowed
> to keep kicking the watchdog. The one that loses access, assuming it is
> alive at all, should be forced off when the watchdog timer expires.
>
> digimer
>
>
> On 2018-04-28 09:19 PM, Wei Shan wrote:
> > Hi,
> >
> > I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I
> > understand this is a really bad setup but this is what the end-user
> wants.
> >
> > ATB => auto_tie_breaker
> >
> > "When the auto_tie_breaker is used in even-number member clusters, then
> > the failure of the partition containing the auto_tie_breaker_node (by
> > default the node with lowest ID) will cause other partition to become
> > inquorate and it will self-fence. In 2-node clusters with
> > auto_tie_breaker this means that failure of node favoured by
> > auto_tie_breaker_node (typically nodeid 1) will result in reboot of
> > other node (typically nodeid 2) that detects the inquorate state. If
> > this is undesirable then corosync-qdevice can be used instead of the
> > auto_tie_breaker to provide additional vote to quorum making behaviour
> > closer to odd-number member clusters."
> >
> > Thanks
> >
> >
> > On Sun, 29 Apr 2018 at 02:15, Digimer <lists at alteeve.ca
> > <mailto:lists at alteeve.ca>> wrote:
> >
> >     On 2018-04-28 09:06 PM, Wei Shan wrote:
> >     > Hi all,
> >     >
> >     > If I have a 2 node cluster with ATB enabled and the lowest node ID
> >     node
> >     > has failed. What will happen? My assumption is that the higher
> node ID
> >     > node will self fence and be rebooted. What happens after that?
> >     >
> >     > Thanks!
> >     >
> >     > --
> >     > Regards,
> >     > Ang Wei Shan
> >
> >     Which cluster stack is this? I am not familiar with the term "ATB".
> >
> >     If it's a standard pacemaker or cman/rgmanager cluster, then on node
> >     failure, the good node should block and request a fence (a lost node
> is
> >     not allowed to be assumed gone via self fence, except when using a
> >     watchdog timer based fence agent). If the fence doesn't work, the
> >     survivor should remain blocked (better to hang than risk
> corruption). If
> >     the fence succeeds, then the survivor node will recover any lost
> >     services based on the configuration of those services (usually a
> simple
> >     (re)start on the good node).
> >
> >     --
> >     Digimer
> >     Papers and Projects: https://alteeve.com/w/
> >     "I am, somehow, less interested in the weight and convolutions of
> >     Einstein’s brain than in the near certainty that people of equal
> talent
> >     have lived and died in cotton fields and sweatshops." - Stephen Jay
> >     Gould
> >
> >
> >
> > --
> > Regards,
> > Ang Wei Shan
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>

-- 
Regards,
Ang Wei Shan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180430/4245413e/attachment-0002.html>