[ClusterLabs] Antw: Re: questions about startup fencing

Tue Dec 5 09:01:06 UTC 2017

Dne 4.12.2017 v 23:17 Ken Gaillot napsal(a):
> On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote:
>> 04.12.2017 18:47, Tomas Jelinek пишет:
>>> Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):
>>>> Tomas Jelinek <tojeline at redhat.com> writes:
>>>>
>>>>>>
>>>>>> * how is it shutting down the cluster when issuing "pcs
>>>>>> cluster stop
>>>>>> --all"?
>>>>>
>>>>> First, it sends a request to each node to stop pacemaker. The
>>>>> requests
>>>>> are sent in parallel which prevents resources from being moved
>>>>> from node
>>>>> to node. Once pacemaker stops on all nodes, corosync is stopped
>>>>> on all
>>>>> nodes in the same manner.
>>>>>
>>>>>> * any race condition possible where the cib will record only
>>>>>> one
>>>>>> node up before
>>>>>>      the last one shut down?
>>>>>> * will the cluster start safely?
>>>>
>>>> That definitely sounds racy to me. The best idea I can think of
>>>> would be
>>>> to set all nodes except one in standby, and then shutdown
>>>> pacemaker
>>>> everywhere...
>>>>
>>>
>>> What issues does it solve? Which node should be the one?
>>>
>>> How do you get the nodes out of standby mode on startup?
>>
>> Is --lifetime=reboot valid for cluster properties? It is accepted by
>> crm_attribute and actually puts value as transient_attribute.
> 
> standby is a node attribute, so lifetime does apply normally.
> 

Right, I forgot about this.

I was dealing with 'pcs cluster stop --all' back in January 2015, so I 
don't remember all the details anymore. However, I was able to dig out 
the private email thread where stopping a cluster was discussed with 
pacemaker developers including Andrew Beekhof and David Vossel.

Originally, pcs was stopping nodes in parallel in such a manner that 
each node stopped pacemaker and then corosync independently of other 
nodes. This caused loss of quorum during stopping the cluster, as nodes 
hosting resources which stopped fast disconnected from corosync sooner 
than nodes hosting resources which stopped slowly. Due to quorum 
missing, some resources could not be stopped and the cluster stop 
failed. This is covered in here:
https://bugzilla.redhat.com/show_bug.cgi?id=1180506

The first attempt to fix the issue was to put nodes into standby mode 
with --lifetime=reboot:
https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91

This didn't work for several reasons. One of them was back then there 
was no reliable way to set standby mode with --lifetime=reboot for more 
than one node in a single step. (This may have been fixed in the 
meantime.) There were however other serious reasons for not putting the 
nodes into standby as was explained by Andrew:
- it [putting the nodes into standby first] means shutdown takes longer 
(no node stops until all the resources stop)
- it makes shutdown more complex (== more fragile), eg...
- it result in pcs waiting forever for resources to stop
   - if a stop fails and the cluster is configured to start at boot, 
then the node will get fenced and happily run resources when it returns 
(because all the nodes are up so we still have quorum)
- only potentially benefits resources that have no (or very few) 
dependants and can stop quicker than it takes pcs to get through its 
"initiate parallel shutdown" loop (which should be rather fast since 
there is no ssh connection setup overheads)

So we ended up with just stopping pacemaker in parallel:
https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d

I hope this shed light on why pcs stops clusters the way it does and 
that standby was considered but rejected for good reasons.

Regards,
Tomas