[ClusterLabs] Can a two node cluster start resources if only one node is booted?

john tillman johnt at panix.com
Thu Apr 21 14:18:38 EDT 2022


> On 21.04.2022 18:26, john tillman wrote:
>>> Dne 20. 04. 22 v 20:21 john tillman napsal(a):
>>>>> On 20.04.2022 19:53, john tillman wrote:
>>>>>> I have a two node cluster that won't start any resources if only one
>>>>>> node
>>>>>> is booted; the pacemaker service does not start.
>>>>>>
>>>>>> Once the second node boots up, the first node will start pacemaker
>>>>>> and
>>>>>> the
>>>>>> resources are started.  All is well.  But I would like the resources
>>>>>> to
>>>>>> start when the first node boots by itself.
>>>>>>
>>>>>> I thought the problem was with the wait_for_all option but I have it
>>>>>> set
>>>>>> to "0".
>>>>>>
>>>>>> On the node that is booted by itself, when I run
>>>>>> "corosync-quorumtool"
>>>>>> I
>>>>>> see:
>>>>>>
>>>>>>     [root at test00 ~]# corosync-quorumtool
>>>>>>     Quorum information
>>>>>>     ------------------
>>>>>>     Date:             Wed Apr 20 16:05:07 2022
>>>>>>     Quorum provider:  corosync_votequorum
>>>>>>     Nodes:            1
>>>>>>     Node ID:          1
>>>>>>     Ring ID:          1.2f
>>>>>>     Quorate:          Yes
>>>>>>
>>>>>>     Votequorum information
>>>>>>     ----------------------
>>>>>>     Expected votes:   2
>>>>>>     Highest expected: 2
>>>>>>     Total votes:      1
>>>>>>     Quorum:           1
>>>>>>     Flags:            2Node Quorate
>>>>>>
>>>>>>     Membership information
>>>>>>     ----------------------
>>>>>>         Nodeid      Votes Name
>>>>>>              1          1 test00 (local)
>>>>>>
>>>>>>
>>>>>> My config file look like this:
>>>>>>     totem {
>>>>>>         version: 2
>>>>>>         cluster_name: testha
>>>>>>         transport: knet
>>>>>>         crypto_cipher: aes256
>>>>>>         crypto_hash: sha256
>>>>>>     }
>>>>>>
>>>>>>     nodelist {
>>>>>>         node {
>>>>>>             ring0_addr: test00
>>>>>>             name: test00
>>>>>>             nodeid: 1
>>>>>>         }
>>>>>>
>>>>>>         node {
>>>>>>             ring0_addr: test01
>>>>>>             name: test01
>>>>>>             nodeid: 2
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     quorum {
>>>>>>         provider: corosync_votequorum
>>>>>>         two_node: 1
>>>>>>         wait_for_all: 0
>>>>>>     }
>>>>>>
>>>>>>     logging {
>>>>>>         to_logfile: yes
>>>>>>         logfile: /var/log/cluster/corosync.log
>>>>>>         to_syslog: yes
>>>>>>         timestamp: on
>>>>>>         debug: on
>>>>>>         syslog_priority: debug
>>>>>>         logfile_priority: debug
>>>>>>     }
>>>>>>
>>>>>> Fencing is disabled.
>>>>>>
>>>>>
>>>>> That won't work.
>>>>>
>>>>>> I've also looked in "corosync.log" but I don't know what to look for
>>>>>> to
>>>>>> diagnose this issue.  I mean there are many lines similar to:
>>>>>> [QUORUM] This node is within the primary component and will provide
>>>>>> service.
>>>>>> and
>>>>>> [VOTEQ ] Sending quorum callback, quorate = 1
>>>>>> and
>>>>>> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
>>>>>> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins:
>>>>>> No
>>>>>>
>>>>>> Is there something specific I should look for in the log?
>>>>>>
>>>>>> So can a two node cluster work after booting only one node?  Maybe
>>>>>> it
>>>>>> never will and I am wasting a lot of time, yours and mine.
>>>>>>
>>>>>> If it can, what else can I investigate further?
>>>>>>
>>>>>
>>>>> Before node can start handling resources it needs to know status of
>>>>> other node. Without successful fencing there is no way to accomplish
>>>>> it.
>>>>>
>>>>> Yes, you can tell pacemaker to ignore unknown status. Depending on
>>>>> your
>>>>> resources this could simply prevent normal work or lead to data
>>>>> corruption.
>>>>
>>>>
>>>> Makes sense.  Thank you.
>>>>
>>>> Perhaps some future enhancement could allow for this situation?  I
>>>> mean,
>>>> It might be desirable for some cases to allow for a single node to
>>>> boot,
>>>> determine quorum by two_node=1 and wait_for_all=0, and start resources
>>>> without ever seeing the other node.  Sure, there are dangers of split
>>>> brain but I can see special cases where I want the node to work alone
>>>> for
>>>> a period of time despite the danger.
>>>>
>>>
>>> Hi John,
>>>
>>> How about 'pcs quorum unblock'?
>>>
>>> Regards,
>>> Tomas
>>>
>>
>>
>> Tomas,
>>
>> Thank you for the suggestion.  However it didn't work.  It returned:
>> Error: unable to check quorum status
>>   crm_mon: Error: cluster is not available on this node
>> I checked pacemaker, just in case, and it still isn't running.
>>
>
> Either pacemaker or some service it depends upon attempted to start and
> failed or systemd still waits for some service that is required before
> pacemaker. Checks logs or provide "journalctl -b" output in this state.
>
>


I looked at pacemaker's log and it does not have any updates since the
system was shutdown.  When we booted the node, if it had started and
failed or started and was stopped by systemd there would be something in
this log, no?

journalctl -b is lengthy and I'd rather not attach here but I grep'd
through it and I can't find any pacemaker references.  No errors reported
from systemd.

Once the other node is started, something starts the pacemaker service. 
pacemaker log starts filling up.  journalctl -b sees plenty of pacemaker
entires.  crm_mon and pcs status are working right and show the cluster in
a good state with all resources started properly.

So I don't see anything stopping pacemaker from starting at boot.  It
looks like some piece of cluster software is starting it once the second
node is online.  Maybe corosync?  Although the corosync log doesn't
mention the start of anything.  All it logs is seeing the second node
join.

So what starts pacemaker in this case?

Thank you for the response.

-John


>> I very curious how I could convince the cluster to start its resources
>> on
>> one node in the event that the other node is not able to boot.  But I'm
>> afraid the answer is either to use fencing or add a third node to the
>> cluster or both.
>>
>> -John
>>
>>
>>>> Thank you again.
>>>>
>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>




More information about the Users mailing list