[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

Klaus Wenninger kwenning at redhat.com
Mon Feb 12 10:46:23 EST 2018


On 02/12/2018 04:34 PM, Maxim wrote:
> 12.02.2018 16:15, Klaus Wenninger пишет:
>> On 02/12/2018 01:02 PM, Maxim  wrote:
> > fencing-disabled is probably due to it being a test-setup ... RHEL 6
> > pcs being made for configuring a cman-pacemaker-setup I'm not sure if
> > it is advisable to do a setup for a corosync-2 pacemaker setup with
> > that. You've obviously edited corosync.conf to reflect that ...
> It is ok. Fencing is not required at the time.
> It works well with latest stable corosync and pacemaker that were
> built manually (not from RHEL 6 repos).
> And the attached config was generated by this pcs (i've removed
> 'logging' section from there to decrease a message size).
>
>>
> >>
> >> All is ok mainly. But there is some problem of the cluster
> >> activity performance when the master node is powered off (hard):
> >> the slave node detects that the master one is down after about
> >> 100-3500 ms. And the main question is how to avoid this 3 sec delay
> >> that occurred sometimes.
> >
> > Kind of interesting that you ever get a detection below 2000ms with
> > the token-timeout set to that value. (Given you are doing a
> > hard-shutdown that doesn't give corosync time to sign off.) You've
> > derived these times from the corosync-logs!?
> >
> > Regards, Klaus
> >
> Not actually. After your message i've conduct some more investigations
> with quite active logging on the master node to get the real time when
> node is going down. And... you are right. The delay is close to 4
> seconds. So there is a [foating] bug in my script.
> Thank you for your inside, Klaus =)
>
> Butneverthelessis there any mechanism to force the slave corosync "to
> think" that the master corosync is down?
> [I have seen the abilities of corosync-cfgtools but, seems, it doesn't
> contain similar functionality]
> Or maybe are there some another ways?

Maybe a few notes on the other way ;-)
In general it is not easy to have a reliable answer
to the question if the other node is down within just
let's say 100ms.
Think of network-hickups, scheduling issues and
alike ...
But if you are willing to accept false-positives
you can reduce the token timeout of corosync
instead of having another script that tries to do
the job corosync is (amonst other things) made
for (At least that is how I understood what you
are aiming to do.).

Regards,
Klaus

>
> Regards, Maxim
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


  




More information about the Users mailing list