[ClusterLabs] Antw: Re: reboot node / cluster standby

Thu Jul 6 13:36:02 UTC 2017

On 07/06/2017 02:21 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 29.06.2017 um 21:15 in Nachricht
> <44ee8b24-fe14-a204-f791-248546c2ff8c at redhat.com>:
>> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
>>> On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>>> On 06/29/2017 04:42 AM, philipp.achmueller at arz.at wrote:
>>>>> Hi,
>>>>>
>>>>> In order to reboot a Clusternode i would like to set the node to standby
>>>>> first, so a clean takeover for running resources can take in place.
>>>>> Is there a default way i can set in pacemaker, or do i have to setup my
>>>>> own systemd implementation?
>>>>>
>>>>> thank you!
>>>>> regards
>>>>> ------------------------
>>>>> env:
>>>>> Pacemaker 1.1.15
>>>>> SLES 12.2
>>>>
>>>> If a node cleanly shuts down or reboots, pacemaker will move all
>>>> resources off it before it exits, so that should happen as you're
>>>> describing, without needing an explicit standby.
>>>
>>> This makes me wonder about timeouts. Specifically OS/systemd timeouts.
>>> Say the node being shut down or rebooted holds a resource as a master,
>>> and it takes a while for the demote to complete, say 100 seconds (less
>>> than the demote timeout of 120s in this hypothetical scenario).  Will
>>> the OS/systemd wait until pacemaker exits cleanly on a regular CentOS
>>> or Debian?
>>
>> Yes. The pacemaker systemd unit file uses TimeoutStopSec=30min.
> 
> From crm ra info ocf:heartbeatSAPDatabase:
> Operations' defaults (advisory minimum):
> 
>     start         timeout=1800
>     stop          timeout=1800
>     status        timeout=60
>     monitor       timeout=60 interval=120
>     methods       timeout=5
> 
> 
> ;-)
> 
> So your score may vary. The RA probably won't take that long, but we have VMs that need > 6 minutes to shut down. If you shut down 10 such VMs sequentially, you need to be patient (at least)...

Yes, good point -- 30 minutes is just a "good enough for most users"
default value. If someone has unusual requirements, they need to create
a systemd drop-in with a higher TimeoutStopSec.

>>>> Explicitly doing standby first would be useful mainly if you want to
>>>> manually check the results of the takeover before proceeding with the
>>>> reboot, and/or if you want the node to come back in standby mode next
>>>> time it joins.