[ClusterLabs] reducing corosync-qnetd "response time"

Thu Oct 24 13:30:34 EDT 2019

24.10.2019 16:54, Sherrard Burton пишет:
> background:
> we are upgrading a (very) old HA cluster running heartbeat DRBD and NFS,
> with no stonith, to a much more modern implementation. for the existing
> cluster, as well as the new one, the disk space requirements make
> running a full three-node cluster infeasible, so i am trying to
> configure a quorum-only node using corosync-qnetd.
> 
> the installation went fine, the nodes can communicate, etc, and the
> cluster seema to perform as desired when gracefully shutting down or
> restarting a node. but during my torture testing, simulating a node
> crash by stopping the network on one node leaves the remaining node in
> limbo for approximately 20 seconds before it and the quorum-only node
> decide that they are indeed quorate.
> 
> the problem:
> the intended implementation involves DRBD, and its resource-level
> fencing freezes IO during the time that the remaining node is inquorate
> in order to avoid any possible data divergence/split-brain. this
> precaution is obviously desirable, and is the reason that i am trying to
> configure this cluster "properly".
> 
> my (admittedly naive) expectation is that the remaining node and the
> quorum-only node would continue ticking along as if nothing happened,
> and i am hoping that this delay is due to some
> misconfiguration/oversight/bone-headedness on my part.
> 
> so i am seeking understanding on the reason for this delay, and whether
> there is any (prudent) way to reduce it. of course, any other advice on
> the intended setup is welcome as well.
> 
> please let me know if you require any additional details.
> 


You may be interested in this discussion

https://www.mail-archive.com/users@clusterlabs.org/msg08907.html