[ClusterLabs] Antw: [EXT] Re: QDevice not found after reboot but appears after cluster restart

john tillman johnt at panix.com
Mon Aug 1 10:18:22 EDT 2022


>>>> "john tillman" <johnt at panix.com> schrieb am 29.07.2022 um 22:51 in
> Nachricht
> <beb30bf64d4c615aff6034000038118c.squirrel at mail.panix.com>:
>>> > On Thursday 28 July 2022 at 22:17:01, john tillman wrote:
>>>>
>>>>> I have a two cluster setup with a qdevice. 'pcs quorum status' from a
>>>>> cluster node shows the qdevice casting a vote.  On the qdevice node
>>>>> 'corosync‑qnetd‑tool ‑s' says I have 2 connected clients and 1
>>>>> cluster.
>>>>> The vote count looks correct when I shutdown either one of the
>>>>> cluster
>>>>> nodes or the qdevice.  So the voting seems to be working at this
>>>>> point.
>>>>
>>>> Indeed ‑ shutting down 1 of 3 nodes leaves quorum intact, therefore
>>>> everything
>>>> still awake knows what's going on.
>>>>
>>>>> From this state, if I reboot both my cluster nodes at the same time
>>>>
>>>> Ugh!
>>>>
>>>>> but leave the qdevice node running, the cluster will not see the
>>>>> qdevice
>>>>> when the nodes come back up: 'pcs quorum status' show 3 votes
>>>>> expected
>>>>> but
>>>>> only 2 votes cast (from the cluster nodes).
>>>>
>>>> I would think this is to be expected, since if you reboot 2 out of 3
>>>> nodes,
>>>> you completely lose quorum, so the single node left has no idea what
>>>> to
>>>> trust
>>>> when the other nodes return.
>>>
>>> No, no.  I do have quorum after the reboots.  It is courtesy of the 2
>>> cluster nodes casting their quorum votes.  However, the qdevice is not
>>> casting a vote so I am down to 2 out of 3 nodes.
>>>
>>> And the qdevice is not part of the cluster.  It will never have any
>>> resources running on it.  Its job is just to vote.
>>>
>>> ‑John
>>>
>>
>> I thought maybe the problem was that the network wasn't ready when
>> corosync.service started so I forced a "ExecStartPre=/usr/bin/sleep 10"
>> into it but that didn't change anything.
>
> This type of fix is broken anyway: You are not delaying, you are waiting
> for
> an event (network up).
> Basically the OS distribution should have configured it correctly already.
>
> In SLES15 there is:
> Requires=network-online.target
> After=network-online.target
>

Thank you for the response.

Yes, I saw that those values were correctly set in the service
configuration file for corosync.  The delay was just a test. I just wanted
to make sure that it wasn't a race condition of bringing up the bond and
trying to connect to the quorum node.

I was grep'ing the corosync log for VOTEQ entries and noticed when it
works I see consecutively:
... [VOTEQ ] Sending quorum callback, quorate = 0
... [VOTEQ ] Received qdevice op 1 req from node 1 [QDevice]
When it does not work I never see 'Received qdevice...' line in the log.
Is there something else I can look for to find this problem?  Some other
test you can think of?  Maybe some configuration of the votequorum
service?


>>
>> I could still use some advice with debugging this oddity.  Or have I
>> used
>> up my quota of questions this year :‑)
>>
>> ‑John
>>
>>>>
>>>> Starting from a situation such as this, your only hope is to rebuilt
>>>> the
>>>> cluster from scratch, IMHO.
>>>>
>>>>
>>>> Antony.
>>>>
>>>> ‑‑
>>>> Police have found a cartoonist dead in his house.  They say that
>>>> details
>>>> are
>>>> currently sketchy.
>>>>
>>>>                                                    Please reply to the
>>>> list;
>>>>                                                          please
>>>> *don't*
>>>> CC
>>>> me.
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>




More information about the Users mailing list