[ClusterLabs] resource start after network reconnected

Fri Nov 19 14:57:11 EST 2021

> On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote:
>
> <snip>
>
>> > If pacemaker tries to stop resources due to out of quorum
>> > condition, you
>> > could set suitable failure-timeout; this will be equivalent to
>> > using "pcs
>> > resource refresh". Keep in mind that pacemaker only checks for
>> > failure-timeout expiration every cluster-recheck-interval (15
>
> That's true only for Pacemaker versions less than 2.0.3; since 2.0.3,
> the cluster rechecks as soon as the timeout hits.

I'm using pacemaker 2.0.5 and it is *not* starting MySQL when quorum is
restored, at least not every time (~1 in 10).  So I have seen it work
before but I'm more willing to believe that there was a user error in that
one successful sample.

We (actual a team mate) got mysql to start when quorum is restored.  It
required both setting the cluster-recheck-interval to something more
frequent than 15min  and  setting the mysql resource's failure-timeout to
non-zero.  In our case we set both to 1 minute with good results for the
last few tests.  We can raise the frequency to something greater than 1
but for our tests, 1 proves it out.

>
>> > minutes by
>> > default). This still is not directly related to network
>> > availability, but
>> > if network outage resulted in node going out of quorum, when
>> > network is
>> > back and node joined cluster again it will allow resources to be
>> > started
>> > on node.
>> >
>>
>> When quorum is lost I want all the resources to stop.  The cluster is
>> performing this step correctly for me.
>
> As long as it's working properly. If quorum is lost because one of the
> nodes is malfunctioning -- maybe a device driver locked up the system,
> or CPU wait is horrific due to an out-of-control process or disk
> failure -- then that node will not know quorum has been lost and will
> not stop resources. If the condition then clears up, suddenly you have
> split-brain with two nodes running resources.
>
>>
>> That cluster-recheck-interval would explain the intermittence I saw
>> this
>> morning.  If I set that to 1 minute would that cause any gross
>> negative
>> issues?
>
> It increases CPU usage and IPC traffic. For Pacemaker 2.0.3 or later, I
> definitely wouldn't bother. For older versions, 1 minute feels a bit
> much, I would go with around 5.
>
>>
>> Is there another setting besides cluster-recheck-interval to consider
>> adjusting to start mysql when quorum is returned?
>>
>> Thank you for the feedback.
>>
>> -John
>
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>