[ClusterLabs] Strange lost quorum with qdevice

Mon Aug 12 01:46:24 EDT 2019

Олег Самойлов napsal(a):
> 
> 
>> 9 авг. 2019 г., в 9:25, Jan Friesse <jfriesse at redhat.com> написал(а):
>> Please do not set dpd_interval that high. dpd_interval on qnetd side is not about how often is the ping is sent. Could you please retry your test with dpd_interval=1000? I'm pretty sure it will work then.
>>
>> Honza
> 
> Yep. As far as I undestand dpd_interval of qnetd, timeout and sync_timeout of qdevice is somehow linked. By default they are dpd_interval=10, timeout=10, sync_timeout=30. And you advised to change them proportionally.

Yes, timeout and sync_timeout should be changed proportionally. 
dpd_interval is different story.

> 
> https://github.com/ClusterLabs/sbd/pull/76#issuecomment-486952369
> 
> But mechanic how they are depend on each other is mysterious and is not documented.

Let me try to bring some light in there:

- dpd_interval is qnetd variable how often qnetd walks thru the list of 
all clients (qdevices) and checks timestamp of last sent message. If 
diff between current timestamp and last sent message timestamp is larger 
than 2 * timeout sent by client then client is considered as death.

- interval - affects how often qdevice sends heartbeat to corosync (this 
is half of the interval) about its liveness and also how often it sends 
heartbeat to qnetd (0.8 * interval). On corosync side this is used as a 
timeout after which qdevice daemon is considered death and its votes are 
no longer valid.

- sync_timeout - Not used by qdevice/qnetd. Used by corosync during sync 
phase. If corosync doesn't get reply by qdevice till this timeout it 
considers qdevice daemon death and continues sync process.

> 
> I rechecked test with 20-60 combination. I get the same problem on 16th failure simultation. The 
qnetd return vote exactly in the same second, when qdevice expects, but 
slightly less. So the node lost quorum, got vote slightly later, but 
don't get quorum may be due to 'wait for all' option.
> 
> I retried the default 10-30 combination. I got the same problem on the first failure simulation. Qnetd send vote after 1 second, then expected.
> 
> Combination is 1-3 (dpd_interval=1, timeout=1, sync_timeout=3). The same problem on 11th failore simulation. The qnetd return vote exactly in the same second, when qdevice expects, but slightly less. So the node lost quorum, got vote slightly later, but don't get quorum may be due to 'wait for all' option. And node is watchdoged later due to lack of quorum.

It was probably not evident from my reply, but what I meant was to 
change just dpd_interval. Could you please recheck with dpd_interval=1, 
timeout=20, sync_timeout=60?

Honza

> 
> So, my conclusions:
> 
> 1. IMHO may be this bug depend not on absolute value of dpd_interval, on proportion between dpd_interval of qnetd and timeout, sync_timeout of qdevice. Because this options, I can not predict how to change them to work around this behaviour.
> 2. IMHO "wait for all" also bugged. According on documentation it must fire only on the start of cluster, but looked like it fire every time when quorum (or all votes) is lost.
>