[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

Maxim wizofta at rambler.ru
Mon Feb 12 07:02:32 EST 2018


Hello,

[Sorry for a message duplication. Web mail client ruined the formatting 
of the previous e-mail =( ]

There is a simple configuration of two cluster nodes (built via RHEL 6 
pcs interface) with multiple master/slave resources, disabled fencing 
and the single sync interface.

All is ok mainly. But there is some problem of the cluster activity 
performance when the master node is powered off (hard): the slave node 
detects that the master one is down after about 100-3500 ms. And the 
main question is how to avoid this 3 sec delay that occurred sometimes.

On the slave node i have a little script that checks the connection to 
the master node. It detects a problem of a sync breakage within about 
100 ms. But corosync requires a much more time sometimes to figure out 
the situation and mark the master node as offline one. It shows 'ok' 
ring status.

If i understand correctly then
1 the pacemaker actions (crm_resource --move) will not perform until 
corosync is not refreshed its ring state
2 the detection of a problem (from a corosync side) can be speeded up 
via timeout tuning in the corosync.conf
3 there is no way to ask corosync to recheck its ring status or mark a 
ring as failed manually

But maybe i'm missing something.

All i want is to move resources faster.
In my little script i tried to force the cluster software to move 
resources to the slave node. But i've no success so far.

Could you please share your thoughts about the situation.
Thank you in advance.


Cluster software:
corosync - 2.4.3
pacemaker - 1.1.18
libqb - 1.0.2


corosync.conf:
totem {
       version: 2
       secauth: off
       cluster_name: cluster
       transport: udpu
       token: 2000
}

nodelist {
      node {
          ring0_addr: main-node
          nodeid: 1
     }

      node {
          ring0_addr: reserve-node
          nodeid: 2
      }
}

quorum {
      provider: corosync_votequorum
      two_node: 1
}


Regards,
Maxim.




More information about the Users mailing list