[ClusterLabs] Antw: [EXT] Cannot add a node with pcs

Piotr Szafarczyk piotr-l at netexpert.pl
Wed Jul 13 12:59:33 EDT 2022


Hi Ulrich,

Thank you. I am perfectly aware that operating without stonith is not a 
good idea :). I am sure I will add it. But first I need to understand 
the current state. I am afraid of introducing something new before I fix 
the current problem.

Best regards,
Piotr

On 13.07.2022 08:00, Ulrich Windl wrote:
>>>> Piotr Szafarczyk <piotr-l at netexpert.pl> schrieb am 12.07.2022 um 12:34 in
> Nachricht <38ccc24a-7b01-561c-20f8-ec2273a1894f at netexpert.pl>:
>> Hi,
>>
>> I used to have a working cluster with 3 nodes (and stonith disabled).
> THE SLES guide says:
> Important: No Support Without STONITH
> You must have a node fencing mechanism for your cluster.
> The global cluster options stonith-enabled and startup-fencing must be
> set to true . When you change them, you lose support.
>
> Maybe that helps.
>
>> After an unexpected restart of one node, the cluster split. The node #2
>> started to see the others as unclean. Nodes 1 and 2 were cooperating
>> with each other, showing #2 as offline. There were no network connection
>> problems.
>>
>> I removed #2 (operating from #1) with
>> pcs cluster node remove n2
>>
>> I verified that it had removed all configuration from #2, both for
>> corosync and for pacemaker. The cluster looks like working correctly
>> with two nodes (and no traces of #2).
>>
>> Now I am trying to add the third node back.
>> pcs cluster node add n2
>> Disabling SBD service...
>> n2: sbd disabled
>> Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
>> n2: successful distribution of the file 'corosync authkey'
>> n2: successful distribution of the file 'pacemaker authkey'
>> Sending updated corosync.conf to nodes...
>> n3: Succeeded
>> n2: Succeeded
>> n1: Succeeded
>> n3: Corosync configuration reloaded
>>
>> I am able to start #2 operating from #1
>>
>> pcs cluster pcsd-status
>>     n2: Online
>>     n3: Online
>>     n1: Online
>>
>> pcs cluster enable n2
>> pcs cluster start n2
>>
>> I can see that corosync's configuration has been updated, but
>> pacemaker's not.
>>
>> _Checking from #1:_
>>
>> pcs config
>> Cluster Name: n
>> Corosync Nodes:
>>    n1 n3 n2
>> Pacemaker Nodes:
>>    n1 n3
>> [...]
>>
>> pcs status
>>     * 2 nodes configured
>> Node List:
>>     * Online: [ n1 n3 ]
>> [...]
>>
>> pcs cluster cib scope=nodes
>> <nodes>
>>     <node id="1" uname="n1"/>
>>     <node id="3" uname="n3"/>
>> </nodes>
>>
>> _#2 is seeing the state differently:_
>>
>> pcs config
>> Cluster Name: n
>> Corosync Nodes:
>>    n1 n3 n2
>> Pacemaker Nodes:
>>    n1 n2 n3
>>
>> pcs status
>>     * 3 nodes configured
>> Node List:
>>     * Online: [ n2 ]
>>     * OFFLINE: [ n1 n3 ]
>> Full List of Resources:
>>     * No resources
>> [...]
>> (there are resources configured on #1 and #3)
>>
>> pcs cluster cib scope=nodes
>> <nodes>
>>     <node id="1" uname="n1"/>
>>     <node id="3" uname="n3"/>
>>     <node id="2" uname="n2"/>
>> </nodes>
>>
>> Help me diagnose it please. Where should I look for the problem? (I have
>> already tried a few things more - I see nothing helpful in log files,
>> pcs --debug shows nothing suspicious, tried even editing the CIB manually)
>>
>> Best regards,
>>
>> Piotr Szafarczyk
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list