[ClusterLabs] resource start after network reconnected
Ken Gaillot
kgaillot at redhat.com
Fri Nov 19 12:45:11 EST 2021
On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote:
<snip>
> > If pacemaker tries to stop resources due to out of quorum
> > condition, you
> > could set suitable failure-timeout; this will be equivalent to
> > using "pcs
> > resource refresh". Keep in mind that pacemaker only checks for
> > failure-timeout expiration every cluster-recheck-interval (15
That's true only for Pacemaker versions less than 2.0.3; since 2.0.3,
the cluster rechecks as soon as the timeout hits.
> > minutes by
> > default). This still is not directly related to network
> > availability, but
> > if network outage resulted in node going out of quorum, when
> > network is
> > back and node joined cluster again it will allow resources to be
> > started
> > on node.
> >
>
> When quorum is lost I want all the resources to stop. The cluster is
> performing this step correctly for me.
As long as it's working properly. If quorum is lost because one of the
nodes is malfunctioning -- maybe a device driver locked up the system,
or CPU wait is horrific due to an out-of-control process or disk
failure -- then that node will not know quorum has been lost and will
not stop resources. If the condition then clears up, suddenly you have
split-brain with two nodes running resources.
>
> That cluster-recheck-interval would explain the intermittence I saw
> this
> morning. If I set that to 1 minute would that cause any gross
> negative
> issues?
It increases CPU usage and IPC traffic. For Pacemaker 2.0.3 or later, I
definitely wouldn't bother. For older versions, 1 minute feels a bit
much, I would go with around 5.
>
> Is there another setting besides cluster-recheck-interval to consider
> adjusting to start mysql when quorum is returned?
>
> Thank you for the feedback.
>
> -John
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list