[ClusterLabs] shutdown of 2-Node cluster when power outage

Andrei Borzenkov arvidjaar at gmail.com
Sun Apr 21 09:46:44 EDT 2019


21.04.2019 16:32, Lentes, Bernd пишет:
> ----- Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidjaar at gmail.com:
> 
>> 20.04.2019 22:29, Lentes, Bernd пишет:
>>>
>>>
>>> ----- Am 18. Apr 2019 um 16:21 schrieb kgaillot kgaillot at redhat.com:
>>>
>>>>
>>>> Simply stopping pacemaker and corosync by whatever mechanism your
>>>> distribution uses (e.g. systemctl) should be sufficient.
>>>
>>> That works. But strangely is that after a reboot both nodes are
>>> shown as UNCLEAN. Does the cluster not remeber that it has been shutdown cleanly
>>> ?
>>
>> No. Pacemaker does not care what state cluster was during last shutdown.
>> What matters is what state cluster is now.
> 
> Aah.
>  
>>> Problem is that after starting pacemaker and corosync on one node the other
>>> is fenced because of that. (pacemaker and corosync aren't started automatically
>>> by systemd).
>>>
>>
>> That is correct and expected behavior. If node still did not appear
>> after timeout, pacemaker assumes node is faulted and attempts to proceed
>> with remaining nodes (after all, it is about _availability_ and waiting
>> indefinitely means resources won't be available). For this it needs to
>> ascertain state of missing node, so pacemaker attempts to stonith it.
>> Otherwise each node could attempt to start resources resulting in split
>> brain and data corruption.
>>
>> Either start pacemaker on all nodes at the same time (with reasonable
>> fuzz, doing "systemctl start pacemaker" in several terminal windows
>> sequentially should be enough) or set wait_for_all option in corosync
>> configuration. Note that with if you have two node cluster, two_node
>> corosync option also implies wait_for_all.
> 
> 
> Hi,
> 
> but what is if one node has e.g. a hardware failure and i have to wait for the spare part ?
> With wait_for_all it can't start the resources.

Wait_for_all is only considered during initial startup. Once cluster is
up, node can fail and pacemaker will fail over resources as appropriate.
When node comes back it will join cluster.

If your question is - how do I start incomplete cluster - well, you can
temporary unset wait_for_all, or you can remove node from cluster and
add it back when it becomes available.

Or you can make sure you start pacemaker on all nodes simultaneously.
You do it manually anyway, so what prevents you from starting pacemaker
on all nodes close to each other? If you are using pcs, "pcs cluster
start --all" should do it for you.

Or you can live with extra stonith.

At the end it is up to you to decide what action plan is most
appropriate. What you cannot have is computer reading your mind and
knowing when it is safe to ignore missing node.


More information about the Users mailing list