[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

Wed Jan 23 09:20:04 EST 2019

On 01/23/2019 01:53 PM, Lentes, Bernd wrote:
>
> ----- On Jan 22, 2019, at 6:00 PM, kgaillot kgaillot at redhat.com wrote:
>
>> On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote:
>
>>> Now the restart, which makes me trouble.
>>> Currently i want to restart the cluster manually, because i'm not
>>> completly familiar with pacemaker and a bit afraid of getting
>>> constellations
>>> due to automotization i didn't think of before.
>>>
>>> I can do that from anywhere because both nodes have ILO-cards.
>>>
>>> I start e.g. node1 with power button.
>>>
>>> systemctl start corosync
>>> systemctl start pacemaker
>>>   corosync and pacemaker don't start automatically, i read that
>>> several times as a recommendation.
>>> Now my first problem. Let's assume the other node is broken. But i
>>> still want to get
>>> resources running. My no-quorum-policy is ignore. That should be
>>> fine. But i have this setup now and don't get the resources running
>>> automatically.
>> I'm guessing you have corosync 2's wait_for_all set (probably
>> implicitly by two_node). This is a safeguard for the situation where
>> both nodes are booted up but can't see each other.
>>
>> If you're sure the other node is down, you can disable wait_for_all
>> before starting the node. (I'm not sure if this can be changed while
>> corosync is already running.)
>>
> Hi Ken,
>
> I have corosync-2.3.6-9.13.1.x86_64.
> Where can i configure this value ?

speaking of two_node & wait_for_all?
That is configured in the quorum-section of corosync.conf:

quorum {
...
   wait_for_all: 1
   two_node: 1
...
}

As Ken mentioned two_node would already imply wait_for_all.
Dependent on the high-level-tooling you are using that might
take care of that configuration already.

And yes dynamic-configuration of two_node should be possible -
remember that I had to implement that communication with
corosync into sbd for clusters that are expanded node-by-node
using pcs.
'corosync-cfgtool -R' to reload the config.
Using 'corosync-cmapctl' to display or directly set keys should
work as well.

You mentioned 'no-quorum-policy = ignore' before.
Wasn't clear if you have that set at all times. Have seen
howtos suggesting that instead of two_node (probably
coming from times when corosync didn't have 'two_node'
or when quorum was derived by pacemaker directly).
Btw. you probably shouldn't use 'ignore' to prevent the nodes
coming up in parallel without seeing each other - as Ken
mentioned before.
On the other hand startup-fencing - as you've experienced -
would prevent that as well.
But with 'no-quorum-policy = ignore' a node coming up
without connection to the peer would immediately try
to fence the peer - which you definitely wouldn't want
if that one is working properly.
You've probably setup fencing with random-delay or fixed
delays different for each target-node.

Klaus

>
> Bernd
>  
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
> Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org