[ClusterLabs] shutdown of 2-Node cluster when power outage

Tue Apr 23 03:28:54 EDT 2019

Dne 21. 04. 19 v 15:46 Andrei Borzenkov napsal(a):
> 21.04.2019 16:32, Lentes, Bernd пишет:
>> ----- Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidjaar at gmail.com:
>>
>>> 20.04.2019 22:29, Lentes, Bernd пишет:
>>>>
>>>>
>>>> ----- Am 18. Apr 2019 um 16:21 schrieb kgaillot kgaillot at redhat.com:
>>>>
>>>>>
>>>>> Simply stopping pacemaker and corosync by whatever mechanism your
>>>>> distribution uses (e.g. systemctl) should be sufficient.
>>>>
>>>> That works. But strangely is that after a reboot both nodes are
>>>> shown as UNCLEAN. Does the cluster not remeber that it has been shutdown cleanly
>>>> ?
>>>
>>> No. Pacemaker does not care what state cluster was during last shutdown.
>>> What matters is what state cluster is now.
>>
>> Aah.
>>   
>>>> Problem is that after starting pacemaker and corosync on one node the other
>>>> is fenced because of that. (pacemaker and corosync aren't started automatically
>>>> by systemd).
>>>>
>>>
>>> That is correct and expected behavior. If node still did not appear
>>> after timeout, pacemaker assumes node is faulted and attempts to proceed
>>> with remaining nodes (after all, it is about _availability_ and waiting
>>> indefinitely means resources won't be available). For this it needs to
>>> ascertain state of missing node, so pacemaker attempts to stonith it.
>>> Otherwise each node could attempt to start resources resulting in split
>>> brain and data corruption.
>>>
>>> Either start pacemaker on all nodes at the same time (with reasonable
>>> fuzz, doing "systemctl start pacemaker" in several terminal windows
>>> sequentially should be enough) or set wait_for_all option in corosync
>>> configuration. Note that with if you have two node cluster, two_node
>>> corosync option also implies wait_for_all.
>>
>>
>> Hi,
>>
>> but what is if one node has e.g. a hardware failure and i have to wait for the spare part ?
>> With wait_for_all it can't start the resources.
> 
> Wait_for_all is only considered during initial startup. Once cluster is
> up, node can fail and pacemaker will fail over resources as appropriate.
> When node comes back it will join cluster.
> 
> If your question is - how do I start incomplete cluster - well, you can
> temporary unset wait_for_all, or you can remove node from cluster and
> add it back when it becomes available.

Or you can do simple "pcs quorum unblock", or "pcs cluster quorum 
unblock" in old pcs versions.

> Or you can make sure you start pacemaker on all nodes simultaneously.
> You do it manually anyway, so what prevents you from starting pacemaker
> on all nodes close to each other? If you are using pcs, "pcs cluster
> start --all" should do it for you.
> 
> Or you can live with extra stonith.
> 
> At the end it is up to you to decide what action plan is most
> appropriate. What you cannot have is computer reading your mind and
> knowing when it is safe to ignore missing node.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>