[ClusterLabs] Antw: Re: questions about startup fencing

Tue Dec 5 10:05:03 CET 2017

Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a):
> On Mon, 4 Dec 2017 16:50:47 +0100
> Tomas Jelinek <tojeline at redhat.com> wrote:
> 
>> Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a):
>>> On Mon, 4 Dec 2017 12:31:06 +0100
>>> Tomas Jelinek <tojeline at redhat.com> wrote:
>>>    
>>>> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):
>>>>> On Fri, 01 Dec 2017 16:34:08 -0600
>>>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>>       
>>>>>> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:
>>>>>>>
>>>>>>>          
>>>>>>>> Kristoffer Gronlund <kgronlund at suse.com> wrote:
>>>>>>>>> Adam Spiers <aspiers at suse.com> writes:
>>>>>>>>>          
>>>>>>>>>> - The whole cluster is shut down cleanly.
>>>>>>>>>>
>>>>>>>>>> - The whole cluster is then started up again.  (Side question:
>>>>>>>>>> what
>>>>>>>>>>      happens if the last node to shut down is not the first to
>>>>>>>>>> start up?
>>>>>>>>>>      How will the cluster ensure it has the most recent version of
>>>>>>>>>> the
>>>>>>>>>>      CIB?  Without that, how would it know whether the last man
>>>>>>>>>> standing
>>>>>>>>>>      was shut down cleanly or not?)
>>>>>>>>>
>>>>>>>>> This is my opinion, I don't really know what the "official"
>>>>>>>>> pacemaker
>>>>>>>>> stance is: There is no such thing as shutting down a cluster
>>>>>>>>> cleanly. A
>>>>>>>>> cluster is a process stretching over multiple nodes - if they all
>>>>>>>>> shut
>>>>>>>>> down, the process is gone. When you start up again, you
>>>>>>>>> effectively have
>>>>>>>>> a completely new cluster.
>>>>>>>>
>>>>>>>> Sorry, I don't follow you at all here.  When you start the cluster
>>>>>>>> up
>>>>>>>> again, the cluster config from before the shutdown is still there.
>>>>>>>> That's very far from being a completely new cluster :-)
>>>>>>>
>>>>>>> The problem is you cannot "start the cluster" in pacemaker; you can
>>>>>>> only "start nodes". The nodes will come up one by one. As opposed (as
>>>>>>> I had said) to HP Sertvice Guard, where there is a "cluster formation
>>>>>>> timeout". That is, the nodes wait for the specified time for the
>>>>>>> cluster to "form". Then the cluster starts as a whole. Of course that
>>>>>>> only applies if the whole cluster was down, not if a single node was
>>>>>>> down.
>>>>>>
>>>>>> I'm not sure what that would specifically entail, but I'm guessing we
>>>>>> have some of the pieces already:
>>>>>>
>>>>>> - Corosync has a wait_for_all option if you want the cluster to be
>>>>>> unable to have quorum at start-up until every node has joined. I don't
>>>>>> think you can set a timeout that cancels it, though.
>>>>>>
>>>>>> - Pacemaker will wait dc-deadtime for the first DC election to
>>>>>> complete. (if I understand it correctly ...)
>>>>>>
>>>>>> - Higher-level tools can start or stop all nodes together (e.g. pcs has
>>>>>> pcs cluster start/stop --all).
>>>>>
>>>>> Based on this discussion, I have some questions about pcs:
>>>>>
>>>>> * how is it shutting down the cluster when issuing "pcs cluster stop
>>>>> --all"?
>>>>
>>>> First, it sends a request to each node to stop pacemaker. The requests
>>>> are sent in parallel which prevents resources from being moved from node
>>>> to node. Once pacemaker stops on all nodes, corosync is stopped on all
>>>> nodes in the same manner.
>>>
>>> What if for some external reasons one node is slower (load, network,
>>> whatever) than the others and start reacting ? Sending queries in parallel
>>> doesn't feels safe enough in regard with all the race conditions that can
>>> occurs in the same time.
>>>
>>> Am I missing something ?
>>>    
>>
>> If a node gets the request later than others, some resources may be
>> moved to it before it starts shutting down pacemaker as well. Pcs waits
>> for all nodes to shutdown pacemaker before it moves to shutting down
>> corosync. This way, quorum is maintained the whole time pacemaker is
>> shutting down and therefore no services are blocked from stopping due to
>> lack of quorum.
> 
> OK, so if admins or RA expect to start in, the same conditions the cluster was
> shut downed, we have to take care of the shutdown ourselves by hands.
> Considering disabling the resource before shutting down might be the best
> option in the situation as the CRM will take care of switching off things
> correctly in a proper transition.

My understanding is that pacemaker takes care of switching off things 
correctly in a proper transition on its shutdown. So there should be no 
extra care needed. Pacemaker developers, however, need to confirm that.

> 
> That's fine to me, as a cluster shutdown should be part of a controlled
> procedure. I have to update my online docs I suppose now.
> 
> Thank you for your answers!
>