[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown
Maxim
wizofta at rambler.ru
Mon Feb 12 07:02:32 EST 2018
Hello,
[Sorry for a message duplication. Web mail client ruined the formatting
of the previous e-mail =( ]
There is a simple configuration of two cluster nodes (built via RHEL 6
pcs interface) with multiple master/slave resources, disabled fencing
and the single sync interface.
All is ok mainly. But there is some problem of the cluster activity
performance when the master node is powered off (hard): the slave node
detects that the master one is down after about 100-3500 ms. And the
main question is how to avoid this 3 sec delay that occurred sometimes.
On the slave node i have a little script that checks the connection to
the master node. It detects a problem of a sync breakage within about
100 ms. But corosync requires a much more time sometimes to figure out
the situation and mark the master node as offline one. It shows 'ok'
ring status.
If i understand correctly then
1 the pacemaker actions (crm_resource --move) will not perform until
corosync is not refreshed its ring state
2 the detection of a problem (from a corosync side) can be speeded up
via timeout tuning in the corosync.conf
3 there is no way to ask corosync to recheck its ring status or mark a
ring as failed manually
But maybe i'm missing something.
All i want is to move resources faster.
In my little script i tried to force the cluster software to move
resources to the slave node. But i've no success so far.
Could you please share your thoughts about the situation.
Thank you in advance.
Cluster software:
corosync - 2.4.3
pacemaker - 1.1.18
libqb - 1.0.2
corosync.conf:
totem {
version: 2
secauth: off
cluster_name: cluster
transport: udpu
token: 2000
}
nodelist {
node {
ring0_addr: main-node
nodeid: 1
}
node {
ring0_addr: reserve-node
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
Regards,
Maxim.
More information about the Users
mailing list