[ClusterLabs] Antw: [EXT] Re: what is the "best" way to completely shutdown a two-node cluster ?

Fri Feb 11 10:23:04 EST 2022

On Fri, 2022-02-11 at 08:07 +0100, Ulrich Windl wrote:
> > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
> > > > 10.02.2022 um
> 16:40 in
> Nachricht <20220210164000.2e395a37 at karst>:
> > On Thu, 10 Feb 2022 22:15:07 +0800
> > Roger Zhou via Users <users at clusterlabs.org> wrote:
> > 
> > > On 2/9/22 17:46, Lentes, Bernd wrote:
> > > > 
> > > > ----- On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais
> > > > jgdr at dalibo.com wrote:
> > > > 
> > > > > On Mon, 7 Feb 2022 14:24:44 +0100 (CET)
> > > > > "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > i'm currently changing a bit in my cluster because i
> > > > > > realized that my
> > > > > > configuration for a power outtage didn't work as i
> > > > > > expected. My idea
> is
> > > > > > currently:
> > > > > > - first stop about 20 VirtualDomains, which are my
> > > > > > services. This will
> > > > > > surely takes some minutes. I'm thinking of stopping each
> > > > > > with a time
> > > > > > difference of about 20 seconds for not getting to much IO
> > > > > > load. and
> then
> > > > > > ...
> > > 
> > > This part is tricky. At one hand, it is good thinking to throttle
> > > IO load.
> > > 
> > > On the other hand, as Jehan and Ulrich mentioned, `crm resource
> > > stop <rsc>`
> > > introduces "target‑role=Stopped" for each VirtualDomain, and have
> > > to do
> `crm 
> > > resource start <rsc>` to changed it back to "target‑role=Started"
> > > to start
> > > them after the power outage.
> > 
> > I wonder if after the cluster shutdown complete, the target-
> > role=Stopped 
> > could
> > be removed/edited offline with eg. crmadmin? That would make
> > VirtualDomain
> > startable on boot.
> 
> It has also discussed before: "restart" is implemented by "first
> change role
> to stopped, then change role to started".
> If the performing node is fenced due to a stop failure, the resource
> is never
> started.
> So what's needed is a transient (i.e.: not saved in CIB) "restart"
> operation,
> that reverts to the previous state (started, most likely) if the the
> node
> performing it dies.
> Now transfer this to "stop-all-resources": The role attribute in the
> CIB would
> never be changed, but maybe just all the LRMs would stop their
> resources,
> eventually shutting down and when the node comes up again, the
> previous state
> will be re-established.

Setting node standby as a transient attribute works very much like
that. When the node reboots, transient attributes are wiped, so it's
out of standby when it rejoins.

> 
> > I suppose this would not be that simple as it would require to
> > update it on
> > all
> > nodes, taking care of the CIB version, hash, etc... But maybe some
> > tooling
> > could take care of this?
> > 
> > Last, if Bernd need to stop gracefully the VirtualDomain paying
> > attention
> to
> > the I/O load, maybe he doesn't want them start automatically on
> > boot for
> the
> > exact same reason anyway?
> 
> But you can limit the number of concurrent invocations and
> migrations, right?
> Unfortunately I cannot remember the the parameter.

batch-limit is the number of actions that can be initiated
simultaneously across the whole cluster, and migration-limit is the
number of live migration actions that can be initiated simultaneously
on one node (regardless of whether it's the "from" or "to" node).

> 
> If not, that could be some interesting enhancement:
> Like the utilization counting "static" resource consumption, one
> could have a
> dynamic resource consumption (counting semaphore-like) that is
> consumed while
> an operation on an instance naming that resource is being performed.
> So when you name your resource "concurrent_vm_ops" and asign that to
> every vm
> configuration, eventually initalizing the resource to siome thing
> like 2 or 3,
> then you could limit the concurrent VM invocations. Likewise, for
> less heave
> instances you could use more relaxed settings or no restrictions at
> all...
> 
> Regards,
> Ulrich
> 

You can accomplish something similar with an ordering constraint with
kind=Serialize. In the case of "start vm1 then start vm2" with
kind=Serialize, it means that vm1 and vm2 will not be started
simultaneously, but neither actually requires the other or has to be
done in a specific order.
-- 
Ken Gaillot <kgaillot at redhat.com>