[ClusterLabs] Antw: Re: questions about startup fencing

Mon Dec 4 14:21:48 CET 2017

On Mon, 4 Dec 2017 12:31:06 +0100
Tomas Jelinek <tojeline at redhat.com> wrote:

> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a):
> > On Fri, 01 Dec 2017 16:34:08 -0600
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >   
> >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:  
> >>>
> >>>      
> >>>> Kristoffer Gronlund <kgronlund at suse.com> wrote:  
> >>>>> Adam Spiers <aspiers at suse.com> writes:
> >>>>>      
> >>>>>> - The whole cluster is shut down cleanly.
> >>>>>>
> >>>>>> - The whole cluster is then started up again.  (Side question:
> >>>>>> what
> >>>>>>    happens if the last node to shut down is not the first to
> >>>>>> start up?
> >>>>>>    How will the cluster ensure it has the most recent version of
> >>>>>> the
> >>>>>>    CIB?  Without that, how would it know whether the last man
> >>>>>> standing
> >>>>>>    was shut down cleanly or not?)  
> >>>>>
> >>>>> This is my opinion, I don't really know what the "official"
> >>>>> pacemaker
> >>>>> stance is: There is no such thing as shutting down a cluster
> >>>>> cleanly. A
> >>>>> cluster is a process stretching over multiple nodes - if they all
> >>>>> shut
> >>>>> down, the process is gone. When you start up again, you
> >>>>> effectively have
> >>>>> a completely new cluster.  
> >>>>
> >>>> Sorry, I don't follow you at all here.  When you start the cluster
> >>>> up
> >>>> again, the cluster config from before the shutdown is still there.
> >>>> That's very far from being a completely new cluster :-)  
> >>>
> >>> The problem is you cannot "start the cluster" in pacemaker; you can
> >>> only "start nodes". The nodes will come up one by one. As opposed (as
> >>> I had said) to HP Sertvice Guard, where there is a "cluster formation
> >>> timeout". That is, the nodes wait for the specified time for the
> >>> cluster to "form". Then the cluster starts as a whole. Of course that
> >>> only applies if the whole cluster was down, not if a single node was
> >>> down.  
> >>
> >> I'm not sure what that would specifically entail, but I'm guessing we
> >> have some of the pieces already:
> >>
> >> - Corosync has a wait_for_all option if you want the cluster to be
> >> unable to have quorum at start-up until every node has joined. I don't
> >> think you can set a timeout that cancels it, though.
> >>
> >> - Pacemaker will wait dc-deadtime for the first DC election to
> >> complete. (if I understand it correctly ...)
> >>
> >> - Higher-level tools can start or stop all nodes together (e.g. pcs has
> >> pcs cluster start/stop --all).  
> > 
> > Based on this discussion, I have some questions about pcs:
> > 
> > * how is it shutting down the cluster when issuing "pcs cluster stop
> > --all"?  
> 
> First, it sends a request to each node to stop pacemaker. The requests 
> are sent in parallel which prevents resources from being moved from node 
> to node. Once pacemaker stops on all nodes, corosync is stopped on all 
> nodes in the same manner.

What if for some external reasons one node is slower (load, network, whatever)
than the others and start reacting ? Sending queries in parallel doesn't
feels safe enough in regard with all the race conditions that can occurs in the
same time.

Am I missing something ?