[ClusterLabs] node1 and node2 communication time question

Ken Gaillot kgaillot at redhat.com
Tue Aug 9 10:08:05 EDT 2022


On Tue, 2022-08-09 at 15:23 +0900, 권오성 wrote:
> Hello.
> I installed linux ha on raspberry pi as below.
> 1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents
> 2) Host Settings
> 3) 3) sudo reboot
> 4) 4) sudo passwd hacluster
> 5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo
> systemctl enable pacemaker
> 6) 6) sudo pcs cluster destroy
> 7) 7) sudo pcs cluster auth <node1> <node2> -u hacluster -p <password
> for hacluster>
> 8) 8) sudo pcs cluster setup --name <clusterName> <node1> <node2>
> 9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all
> 10) sudo pcs property set stonith-enabled=false
> 11) sudo pcs status
> 12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2
> ip=<address> cidr_netmask=24op monitor interval=30s
> 
> So, I've set it up like this way.
> By the way, is it correct that node1 and node2 communicate every 30
> seconds and node2 will notice after 30 seconds when node1 dies?
> Or do we communicate every few seconds?
> And can node1 and node2 reduce communication time?
> What I want is node1 and node2 to communicate every 10 ms and switch
> as fast as possible.
> Please answer.
> Thank you.

Unfortunately 10ms is not a realistic goal with the current software.

Node loss is detected by Corosync, which passes a token around all
nodes continuously. The token timeout is defined in
/etc/corosync/corosync.conf and defaults to either 1 or 3 seconds. With
2 nodes and a dedicated network for corosync traffic you can probably
get subsecond but I'm not sure what the practical limit is.

Once node loss is detected, most of the time of switching over is in
fencing (which should always be configured, otherwise you risk data
loss or service malfuntions) and the stop/start time of your individual
resources.

Resource loss is detected by recurring monitors. That's where the
interval=30s comes in; the cluster will check the resource's status
that often. You can reduce that, I would say 5 or 10s would be fine,
even below that could be OK. The cluster has to run the scheduler,
invoke the resource agent, and record the result if changed.

When resource loss is detected, the stop/start time of the resource is
the main factor.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list