[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS
kwenning at redhat.com
Wed Jan 23 09:20:04 EST 2019
On 01/23/2019 01:53 PM, Lentes, Bernd wrote:
> ----- On Jan 22, 2019, at 6:00 PM, kgaillot kgaillot at redhat.com wrote:
>> On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote:
>>> Now the restart, which makes me trouble.
>>> Currently i want to restart the cluster manually, because i'm not
>>> completly familiar with pacemaker and a bit afraid of getting
>>> due to automotization i didn't think of before.
>>> I can do that from anywhere because both nodes have ILO-cards.
>>> I start e.g. node1 with power button.
>>> systemctl start corosync
>>> systemctl start pacemaker
>>> corosync and pacemaker don't start automatically, i read that
>>> several times as a recommendation.
>>> Now my first problem. Let's assume the other node is broken. But i
>>> still want to get
>>> resources running. My no-quorum-policy is ignore. That should be
>>> fine. But i have this setup now and don't get the resources running
>> I'm guessing you have corosync 2's wait_for_all set (probably
>> implicitly by two_node). This is a safeguard for the situation where
>> both nodes are booted up but can't see each other.
>> If you're sure the other node is down, you can disable wait_for_all
>> before starting the node. (I'm not sure if this can be changed while
>> corosync is already running.)
> Hi Ken,
> I have corosync-2.3.6-9.13.1.x86_64.
> Where can i configure this value ?
speaking of two_node & wait_for_all?
That is configured in the quorum-section of corosync.conf:
As Ken mentioned two_node would already imply wait_for_all.
Dependent on the high-level-tooling you are using that might
take care of that configuration already.
And yes dynamic-configuration of two_node should be possible -
remember that I had to implement that communication with
corosync into sbd for clusters that are expanded node-by-node
'corosync-cfgtool -R' to reload the config.
Using 'corosync-cmapctl' to display or directly set keys should
work as well.
You mentioned 'no-quorum-policy = ignore' before.
Wasn't clear if you have that set at all times. Have seen
howtos suggesting that instead of two_node (probably
coming from times when corosync didn't have 'two_node'
or when quorum was derived by pacemaker directly).
Btw. you probably shouldn't use 'ignore' to prevent the nodes
coming up in parallel without seeing each other - as Ken
On the other hand startup-fencing - as you've experienced -
would prevent that as well.
But with 'no-quorum-policy = ignore' a node coming up
without connection to the peer would immediately try
to fence the peer - which you definitely wouldn't want
if that one is working properly.
You've probably setup fencing with random-delay or fixed
delays different for each target-node.
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
> Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users