[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

Klaus Wenninger kwenning at redhat.com
Mon Feb 12 08:15:05 EST 2018


On 02/12/2018 01:02 PM, Maxim wrote:
> Hello,
>
> [Sorry for a message duplication. Web mail client ruined the
> formatting of the previous e-mail =( ]
>
> There is a simple configuration of two cluster nodes (built via RHEL 6
> pcs interface) with multiple master/slave resources, disabled fencing
> and the single sync interface.

fencing-disabled is probably due to it being a test-setup ...
RHEL 6 pcs being made for configuring a cman-pacemaker-setup
I'm not sure if it is advisable to do a setup for a corosync-2 pacemaker
setup with that. You've obviously edited corosync.conf to
reflect that ...
 
>
> All is ok mainly. But there is some problem of the cluster activity
> performance when the master node is powered off (hard): the slave node
> detects that the master one is down after about 100-3500 ms. And the
> main question is how to avoid this 3 sec delay that occurred sometimes.

Kind of interesting that you ever get a detection below 2000ms with the
token-timeout set to that value. (Given you are doing a hard-shutdown
that doesn't give corosync time to sign off.)
You've derived these times from the corosync-logs!?

Regards,
Klaus

>
> On the slave node i have a little script that checks the connection to
> the master node. It detects a problem of a sync breakage within about
> 100 ms. But corosync requires a much more time sometimes to figure out
> the situation and mark the master node as offline one. It shows 'ok'
> ring status.
>
> If i understand correctly then
> 1 the pacemaker actions (crm_resource --move) will not perform until
> corosync is not refreshed its ring state
> 2 the detection of a problem (from a corosync side) can be speeded up
> via timeout tuning in the corosync.conf
> 3 there is no way to ask corosync to recheck its ring status or mark a
> ring as failed manually
>
> But maybe i'm missing something.
>
> All i want is to move resources faster.
> In my little script i tried to force the cluster software to move
> resources to the slave node. But i've no success so far.
>
> Could you please share your thoughts about the situation.
> Thank you in advance.
>
>
> Cluster software:
> corosync - 2.4.3
> pacemaker - 1.1.18
> libqb - 1.0.2
>
>
> corosync.conf:
> totem {
>       version: 2
>       secauth: off
>       cluster_name: cluster
>       transport: udpu
>       token: 2000
> }
>
> nodelist {
>      node {
>          ring0_addr: main-node
>          nodeid: 1
>     }
>
>      node {
>          ring0_addr: reserve-node
>          nodeid: 2
>      }
> }
>
> quorum {
>      provider: corosync_votequorum
>      two_node: 1
> }
>
>
> Regards,
> Maxim.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list