[ClusterLabs] Can a two node cluster start resources if only one node is booted?

Andrei Borzenkov arvidjaar at gmail.com
Thu Apr 21 12:44:05 EDT 2022


On 21.04.2022 18:26, john tillman wrote:
>> Dne 20. 04. 22 v 20:21 john tillman napsal(a):
>>>> On 20.04.2022 19:53, john tillman wrote:
>>>>> I have a two node cluster that won't start any resources if only one
>>>>> node
>>>>> is booted; the pacemaker service does not start.
>>>>>
>>>>> Once the second node boots up, the first node will start pacemaker and
>>>>> the
>>>>> resources are started.  All is well.  But I would like the resources
>>>>> to
>>>>> start when the first node boots by itself.
>>>>>
>>>>> I thought the problem was with the wait_for_all option but I have it
>>>>> set
>>>>> to "0".
>>>>>
>>>>> On the node that is booted by itself, when I run "corosync-quorumtool"
>>>>> I
>>>>> see:
>>>>>
>>>>>     [root at test00 ~]# corosync-quorumtool
>>>>>     Quorum information
>>>>>     ------------------
>>>>>     Date:             Wed Apr 20 16:05:07 2022
>>>>>     Quorum provider:  corosync_votequorum
>>>>>     Nodes:            1
>>>>>     Node ID:          1
>>>>>     Ring ID:          1.2f
>>>>>     Quorate:          Yes
>>>>>
>>>>>     Votequorum information
>>>>>     ----------------------
>>>>>     Expected votes:   2
>>>>>     Highest expected: 2
>>>>>     Total votes:      1
>>>>>     Quorum:           1
>>>>>     Flags:            2Node Quorate
>>>>>
>>>>>     Membership information
>>>>>     ----------------------
>>>>>         Nodeid      Votes Name
>>>>>              1          1 test00 (local)
>>>>>
>>>>>
>>>>> My config file look like this:
>>>>>     totem {
>>>>>         version: 2
>>>>>         cluster_name: testha
>>>>>         transport: knet
>>>>>         crypto_cipher: aes256
>>>>>         crypto_hash: sha256
>>>>>     }
>>>>>
>>>>>     nodelist {
>>>>>         node {
>>>>>             ring0_addr: test00
>>>>>             name: test00
>>>>>             nodeid: 1
>>>>>         }
>>>>>
>>>>>         node {
>>>>>             ring0_addr: test01
>>>>>             name: test01
>>>>>             nodeid: 2
>>>>>         }
>>>>>     }
>>>>>
>>>>>     quorum {
>>>>>         provider: corosync_votequorum
>>>>>         two_node: 1
>>>>>         wait_for_all: 0
>>>>>     }
>>>>>
>>>>>     logging {
>>>>>         to_logfile: yes
>>>>>         logfile: /var/log/cluster/corosync.log
>>>>>         to_syslog: yes
>>>>>         timestamp: on
>>>>>         debug: on
>>>>>         syslog_priority: debug
>>>>>         logfile_priority: debug
>>>>>     }
>>>>>
>>>>> Fencing is disabled.
>>>>>
>>>>
>>>> That won't work.
>>>>
>>>>> I've also looked in "corosync.log" but I don't know what to look for
>>>>> to
>>>>> diagnose this issue.  I mean there are many lines similar to:
>>>>> [QUORUM] This node is within the primary component and will provide
>>>>> service.
>>>>> and
>>>>> [VOTEQ ] Sending quorum callback, quorate = 1
>>>>> and
>>>>> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
>>>>> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>>>>>
>>>>> Is there something specific I should look for in the log?
>>>>>
>>>>> So can a two node cluster work after booting only one node?  Maybe it
>>>>> never will and I am wasting a lot of time, yours and mine.
>>>>>
>>>>> If it can, what else can I investigate further?
>>>>>
>>>>
>>>> Before node can start handling resources it needs to know status of
>>>> other node. Without successful fencing there is no way to accomplish
>>>> it.
>>>>
>>>> Yes, you can tell pacemaker to ignore unknown status. Depending on your
>>>> resources this could simply prevent normal work or lead to data
>>>> corruption.
>>>
>>>
>>> Makes sense.  Thank you.
>>>
>>> Perhaps some future enhancement could allow for this situation?  I mean,
>>> It might be desirable for some cases to allow for a single node to boot,
>>> determine quorum by two_node=1 and wait_for_all=0, and start resources
>>> without ever seeing the other node.  Sure, there are dangers of split
>>> brain but I can see special cases where I want the node to work alone
>>> for
>>> a period of time despite the danger.
>>>
>>
>> Hi John,
>>
>> How about 'pcs quorum unblock'?
>>
>> Regards,
>> Tomas
>>
> 
> 
> Tomas,
> 
> Thank you for the suggestion.  However it didn't work.  It returned:
> Error: unable to check quorum status
>   crm_mon: Error: cluster is not available on this node
> I checked pacemaker, just in case, and it still isn't running.
> 

Either pacemaker or some service it depends upon attempted to start and
failed or systemd still waits for some service that is required before
pacemaker. Checks logs or provide "journalctl -b" output in this state.


> I very curious how I could convince the cluster to start its resources on
> one node in the event that the other node is not able to boot.  But I'm
> afraid the answer is either to use fencing or add a third node to the
> cluster or both.
> 
> -John
> 
> 
>>> Thank you again.
>>>
>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/



More information about the Users mailing list