[ClusterLabs] Antwort: Re: reboot node / cluster standby

Mon Jul 3 12:30:12 EDT 2017

On 07/03/2017 08:30 AM, philipp.achmueller at arz.at wrote:
> Ken Gaillot <kgaillot at redhat.com> schrieb am 29.06.2017 21:15:59:
> 
>> Von: Ken Gaillot <kgaillot at redhat.com>
>> An: Ludovic Vaugeois-Pepin <ludovicvp at gmail.com>, Cluster Labs - All
>> topics related to open-source clustering welcomed <users at clusterlabs.org>
>> Datum: 29.06.2017 21:19
>> Betreff: Re: [ClusterLabs] reboot node / cluster standby
>>
>> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
>> > On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot <kgaillot at redhat.com>
> wrote:
>> >> On 06/29/2017 04:42 AM, philipp.achmueller at arz.at wrote:
>> >>> Hi,
>> >>>
>> >>> In order to reboot a Clusternode i would like to set the node to
> standby
>> >>> first, so a clean takeover for running resources can take in place.
>> >>> Is there a default way i can set in pacemaker, or do i have to
> setup my
>> >>> own systemd implementation?
>> >>>
>> >>> thank you!
>> >>> regards
>> >>> ------------------------
>> >>> env:
>> >>> Pacemaker 1.1.15
>> >>> SLES 12.2
>> >>
>> >> If a node cleanly shuts down or reboots, pacemaker will move all
>> >> resources off it before it exits, so that should happen as you're
>> >> describing, without needing an explicit standby.
>> >
> 
> how does this work when evacuating e.g. 5 nodes out of a 10 node cluster
> at the same time?

A clean shutdown works the same regardless of the situation:

- the OS (systemd or whatever) sends a signal to pacemakerd to exit
- a pacemaker daemon on the local node sends a shutdown request to the
DC node
- the DC node moves all resources off the node
- the DC sends an "ok to shutdown" message to the node
- the node's pacemaker daemons exit
- the OS proceeds with system shutdown

The only wrinkle in 5 out of 10 nodes is that most likely (depending on
your configuration) you are losing quorum, and the cluster will stop all
resources on all nodes.

> 
>> > This makes me wonder about timeouts. Specifically OS/systemd timeouts.
>> > Say the node being shut down or rebooted holds a resource as a master,
>> > and it takes a while for the demote to complete, say 100 seconds (less
>> > than the demote timeout of 120s in this hypothetical scenario).  Will
>> > the OS/systemd wait until pacemaker exits cleanly on a regular CentOS
>> > or Debian?
>>
>> Yes. The pacemaker systemd unit file uses TimeoutStopSec=30min.
>>
>> >
>> >
>> >> Explicitly doing standby first would be useful mainly if you want to
>> >> manually check the results of the takeover before proceeding with the
>> >> reboot, and/or if you want the node to come back in standby mode next
>> >> time it joins.