[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

Thu Jan 24 10:01:03 EST 2019

----- On Jan 23, 2019, at 3:20 PM, Klaus Wenninger kwenning at redhat.com wrote:
>> I have corosync-2.3.6-9.13.1.x86_64.
>> Where can i configure this value ?
> 
> speaking of two_node & wait_for_all?
> That is configured in the quorum-section of corosync.conf:
> 
> quorum {
> ...
>   wait_for_all: 1
>   two_node: 1
> ...
> }
> As Ken mentioned two_node would already imply wait_for_all.
> Dependent on the high-level-tooling you are using that might
> take care of that configuration already.
> 
> Using 'corosync-cmapctl' to display or directly set keys should
> work as well.

corosync-cmapctl -b knows two_node and wait_for_all:

ha-idg-1:~ # corosync-cmapctl -b|grep -iE 'wait|two'
quorum.two_node (u8) = 1
runtime.votequorum.two_node (u8) = 1
runtime.votequorum.wait_for_all_status (u8) = 1

man 5 votequorum is very helpful.
It says:
two_node = 1 set the quorum to 1.
wait_for_all = 1 requires both nodes to be up for at least a short time simultaneously before the cluster can operate.
I see this as a disadvantage. What is if one node has a hw problem which can't be fixed in short time ?

> You mentioned 'no-quorum-policy = ignore' before.
> Wasn't clear if you have that set at all times. Have seen
> howtos suggesting that instead of two_node (probably
> coming from times when corosync didn't have 'two_node'
> or when quorum was derived by pacemaker directly).
> Btw. you probably shouldn't use 'ignore' to prevent the nodes
> coming up in parallel without seeing each other - as Ken
> mentioned before.
> On the other hand startup-fencing - as you've experienced -
> would prevent that as well.
> But with 'no-quorum-policy = ignore' a node coming up
> without connection to the peer would immediately try
> to fence the peer - which you definitely wouldn't want
> if that one is working properly.
Yes, i see that.
But corosync and pacemaker aren't start automatically in my setup.
Also my fencing action is off and not reboot.
These two is to check first "what happened" ? and fix it befroe starting the fenced node again.
And my corosync-connection is a bonding device with cables direct to the other server, without a switch.

Do you recommend to switch off ignore ?
But what is if the cluster is running and one node is fenced ?
When i don't have ignore the resources don't continue to run.
Is there a hierarchy or a mutual exclusion of two_node and no-quorum-policy ?
I would say that no-quorum-policy=ignore, two_node=1 and wait_for_all=0 would be the best for a 
two-node cluster.

> You've probably setup fencing with random-delay or fixed
> delays different for each target-node.

One agent has a delay of 20 seconds, the other has no delay.

Bernd

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671