[ClusterLabs] Antw: [EXT] Re: QDevice not found after reboot but appears after cluster restart

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 1 04:43:55 EDT 2022


>>> "john tillman" <johnt at panix.com> schrieb am 29.07.2022 um 22:51 in
Nachricht
<beb30bf64d4c615aff6034000038118c.squirrel at mail.panix.com>:
>> > On Thursday 28 July 2022 at 22:17:01, john tillman wrote:
>>>
>>>> I have a two cluster setup with a qdevice. 'pcs quorum status' from a
>>>> cluster node shows the qdevice casting a vote.  On the qdevice node
>>>> 'corosync‑qnetd‑tool ‑s' says I have 2 connected clients and 1 cluster.
>>>> The vote count looks correct when I shutdown either one of the cluster
>>>> nodes or the qdevice.  So the voting seems to be working at this point.
>>>
>>> Indeed ‑ shutting down 1 of 3 nodes leaves quorum intact, therefore
>>> everything
>>> still awake knows what's going on.
>>>
>>>> From this state, if I reboot both my cluster nodes at the same time
>>>
>>> Ugh!
>>>
>>>> but leave the qdevice node running, the cluster will not see the
>>>> qdevice
>>>> when the nodes come back up: 'pcs quorum status' show 3 votes expected
>>>> but
>>>> only 2 votes cast (from the cluster nodes).
>>>
>>> I would think this is to be expected, since if you reboot 2 out of 3
>>> nodes,
>>> you completely lose quorum, so the single node left has no idea what to
>>> trust
>>> when the other nodes return.
>>
>> No, no.  I do have quorum after the reboots.  It is courtesy of the 2
>> cluster nodes casting their quorum votes.  However, the qdevice is not
>> casting a vote so I am down to 2 out of 3 nodes.
>>
>> And the qdevice is not part of the cluster.  It will never have any
>> resources running on it.  Its job is just to vote.
>>
>> ‑John
>>
> 
> I thought maybe the problem was that the network wasn't ready when
> corosync.service started so I forced a "ExecStartPre=/usr/bin/sleep 10"
> into it but that didn't change anything.

This type of fix is broken anyway: You are not delaying, you are waiting for
an event (network up).
Basically the OS distribution should have configured it correctly already.

In SLES15 there is:
Requires=network-online.target
After=network-online.target

> 
> I could still use some advice with debugging this oddity.  Or have I used
> up my quota of questions this year :‑)
> 
> ‑John
> 
>>>
>>> Starting from a situation such as this, your only hope is to rebuilt the
>>> cluster from scratch, IMHO.
>>>
>>>
>>> Antony.
>>>
>>> ‑‑
>>> Police have found a cartoonist dead in his house.  They say that details
>>> are
>>> currently sketchy.
>>>
>>>                                                    Please reply to the
>>> list;
>>>                                                          please *don't*
>>> CC
>>> me.
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>
>>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list