[ClusterLabs] Can a two node cluster start resources if only one node is booted?
Andrei Borzenkov
arvidjaar at gmail.com
Thu Apr 21 12:44:05 EDT 2022
On 21.04.2022 18:26, john tillman wrote:
>> Dne 20. 04. 22 v 20:21 john tillman napsal(a):
>>>> On 20.04.2022 19:53, john tillman wrote:
>>>>> I have a two node cluster that won't start any resources if only one
>>>>> node
>>>>> is booted; the pacemaker service does not start.
>>>>>
>>>>> Once the second node boots up, the first node will start pacemaker and
>>>>> the
>>>>> resources are started. All is well. But I would like the resources
>>>>> to
>>>>> start when the first node boots by itself.
>>>>>
>>>>> I thought the problem was with the wait_for_all option but I have it
>>>>> set
>>>>> to "0".
>>>>>
>>>>> On the node that is booted by itself, when I run "corosync-quorumtool"
>>>>> I
>>>>> see:
>>>>>
>>>>> [root at test00 ~]# corosync-quorumtool
>>>>> Quorum information
>>>>> ------------------
>>>>> Date: Wed Apr 20 16:05:07 2022
>>>>> Quorum provider: corosync_votequorum
>>>>> Nodes: 1
>>>>> Node ID: 1
>>>>> Ring ID: 1.2f
>>>>> Quorate: Yes
>>>>>
>>>>> Votequorum information
>>>>> ----------------------
>>>>> Expected votes: 2
>>>>> Highest expected: 2
>>>>> Total votes: 1
>>>>> Quorum: 1
>>>>> Flags: 2Node Quorate
>>>>>
>>>>> Membership information
>>>>> ----------------------
>>>>> Nodeid Votes Name
>>>>> 1 1 test00 (local)
>>>>>
>>>>>
>>>>> My config file look like this:
>>>>> totem {
>>>>> version: 2
>>>>> cluster_name: testha
>>>>> transport: knet
>>>>> crypto_cipher: aes256
>>>>> crypto_hash: sha256
>>>>> }
>>>>>
>>>>> nodelist {
>>>>> node {
>>>>> ring0_addr: test00
>>>>> name: test00
>>>>> nodeid: 1
>>>>> }
>>>>>
>>>>> node {
>>>>> ring0_addr: test01
>>>>> name: test01
>>>>> nodeid: 2
>>>>> }
>>>>> }
>>>>>
>>>>> quorum {
>>>>> provider: corosync_votequorum
>>>>> two_node: 1
>>>>> wait_for_all: 0
>>>>> }
>>>>>
>>>>> logging {
>>>>> to_logfile: yes
>>>>> logfile: /var/log/cluster/corosync.log
>>>>> to_syslog: yes
>>>>> timestamp: on
>>>>> debug: on
>>>>> syslog_priority: debug
>>>>> logfile_priority: debug
>>>>> }
>>>>>
>>>>> Fencing is disabled.
>>>>>
>>>>
>>>> That won't work.
>>>>
>>>>> I've also looked in "corosync.log" but I don't know what to look for
>>>>> to
>>>>> diagnose this issue. I mean there are many lines similar to:
>>>>> [QUORUM] This node is within the primary component and will provide
>>>>> service.
>>>>> and
>>>>> [VOTEQ ] Sending quorum callback, quorate = 1
>>>>> and
>>>>> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
>>>>> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>>>>>
>>>>> Is there something specific I should look for in the log?
>>>>>
>>>>> So can a two node cluster work after booting only one node? Maybe it
>>>>> never will and I am wasting a lot of time, yours and mine.
>>>>>
>>>>> If it can, what else can I investigate further?
>>>>>
>>>>
>>>> Before node can start handling resources it needs to know status of
>>>> other node. Without successful fencing there is no way to accomplish
>>>> it.
>>>>
>>>> Yes, you can tell pacemaker to ignore unknown status. Depending on your
>>>> resources this could simply prevent normal work or lead to data
>>>> corruption.
>>>
>>>
>>> Makes sense. Thank you.
>>>
>>> Perhaps some future enhancement could allow for this situation? I mean,
>>> It might be desirable for some cases to allow for a single node to boot,
>>> determine quorum by two_node=1 and wait_for_all=0, and start resources
>>> without ever seeing the other node. Sure, there are dangers of split
>>> brain but I can see special cases where I want the node to work alone
>>> for
>>> a period of time despite the danger.
>>>
>>
>> Hi John,
>>
>> How about 'pcs quorum unblock'?
>>
>> Regards,
>> Tomas
>>
>
>
> Tomas,
>
> Thank you for the suggestion. However it didn't work. It returned:
> Error: unable to check quorum status
> crm_mon: Error: cluster is not available on this node
> I checked pacemaker, just in case, and it still isn't running.
>
Either pacemaker or some service it depends upon attempted to start and
failed or systemd still waits for some service that is required before
pacemaker. Checks logs or provide "journalctl -b" output in this state.
> I very curious how I could convince the cluster to start its resources on
> one node in the event that the other node is not able to boot. But I'm
> afraid the answer is either to use fencing or add a third node to the
> cluster or both.
>
> -John
>
>
>>> Thank you again.
>>>
>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list