[ClusterLabs] Antw: [EXT] Re: QDevice not found after reboot but appears after cluster restart
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 1 04:43:55 EDT 2022
>>> "john tillman" <johnt at panix.com> schrieb am 29.07.2022 um 22:51 in
Nachricht
<beb30bf64d4c615aff6034000038118c.squirrel at mail.panix.com>:
>> > On Thursday 28 July 2022 at 22:17:01, john tillman wrote:
>>>
>>>> I have a two cluster setup with a qdevice. 'pcs quorum status' from a
>>>> cluster node shows the qdevice casting a vote. On the qdevice node
>>>> 'corosync‑qnetd‑tool ‑s' says I have 2 connected clients and 1 cluster.
>>>> The vote count looks correct when I shutdown either one of the cluster
>>>> nodes or the qdevice. So the voting seems to be working at this point.
>>>
>>> Indeed ‑ shutting down 1 of 3 nodes leaves quorum intact, therefore
>>> everything
>>> still awake knows what's going on.
>>>
>>>> From this state, if I reboot both my cluster nodes at the same time
>>>
>>> Ugh!
>>>
>>>> but leave the qdevice node running, the cluster will not see the
>>>> qdevice
>>>> when the nodes come back up: 'pcs quorum status' show 3 votes expected
>>>> but
>>>> only 2 votes cast (from the cluster nodes).
>>>
>>> I would think this is to be expected, since if you reboot 2 out of 3
>>> nodes,
>>> you completely lose quorum, so the single node left has no idea what to
>>> trust
>>> when the other nodes return.
>>
>> No, no. I do have quorum after the reboots. It is courtesy of the 2
>> cluster nodes casting their quorum votes. However, the qdevice is not
>> casting a vote so I am down to 2 out of 3 nodes.
>>
>> And the qdevice is not part of the cluster. It will never have any
>> resources running on it. Its job is just to vote.
>>
>> ‑John
>>
>
> I thought maybe the problem was that the network wasn't ready when
> corosync.service started so I forced a "ExecStartPre=/usr/bin/sleep 10"
> into it but that didn't change anything.
This type of fix is broken anyway: You are not delaying, you are waiting for
an event (network up).
Basically the OS distribution should have configured it correctly already.
In SLES15 there is:
Requires=network-online.target
After=network-online.target
>
> I could still use some advice with debugging this oddity. Or have I used
> up my quota of questions this year :‑)
>
> ‑John
>
>>>
>>> Starting from a situation such as this, your only hope is to rebuilt the
>>> cluster from scratch, IMHO.
>>>
>>>
>>> Antony.
>>>
>>> ‑‑
>>> Police have found a cartoonist dead in his house. They say that details
>>> are
>>> currently sketchy.
>>>
>>> Please reply to the
>>> list;
>>> please *don't*
>>> CC
>>> me.
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list