[ClusterLabs] Antw: [EXT] Re: what is the "best" way to completely shutdown a two-node cluster ?

Fri Feb 11 02:07:33 EST 2022

>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 10.02.2022 um
16:40 in
Nachricht <20220210164000.2e395a37 at karst>:
> On Thu, 10 Feb 2022 22:15:07 +0800
> Roger Zhou via Users <users at clusterlabs.org> wrote:
> 
>> 
>> On 2/9/22 17:46, Lentes, Bernd wrote:
>> > 
>> > 
>> > ----- On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais
>> > jgdr at dalibo.com wrote:
>> > 
>> >> On Mon, 7 Feb 2022 14:24:44 +0100 (CET)
>> >> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> i'm currently changing a bit in my cluster because i realized that my
>> >>> configuration for a power outtage didn't work as i expected. My idea
is
>> >>> currently:
>> >>> - first stop about 20 VirtualDomains, which are my services. This will
>> >>> surely takes some minutes. I'm thinking of stopping each with a time
>> >>> difference of about 20 seconds for not getting to much IO load. and
then
>> >>> ...
>> 
>> This part is tricky. At one hand, it is good thinking to throttle IO load.
>> 
>> On the other hand, as Jehan and Ulrich mentioned, `crm resource stop <rsc>`

>> introduces "target‑role=Stopped" for each VirtualDomain, and have to do
`crm 
> 
>> resource start <rsc>` to changed it back to "target‑role=Started" to start
>> them after the power outage.
> 
> I wonder if after the cluster shutdown complete, the target-role=Stopped 
> could
> be removed/edited offline with eg. crmadmin? That would make VirtualDomain
> startable on boot.

It has also discussed before: "restart" is implemented by "first change role
to stopped, then change role to started".
If the performing node is fenced due to a stop failure, the resource is never
started.
So what's needed is a transient (i.e.: not saved in CIB) "restart" operation,
that reverts to the previous state (started, most likely) if the the node
performing it dies.
Now transfer this to "stop-all-resources": The role attribute in the CIB would
never be changed, but maybe just all the LRMs would stop their resources,
eventually shutting down and when the node comes up again, the previous state
will be re-established.

> 
> I suppose this would not be that simple as it would require to update it on

> all
> nodes, taking care of the CIB version, hash, etc... But maybe some tooling
> could take care of this?
> 
> Last, if Bernd need to stop gracefully the VirtualDomain paying attention
to
> the I/O load, maybe he doesn't want them start automatically on boot for
the
> exact same reason anyway?

But you can limit the number of concurrent invocations and migrations, right?
Unfortunately I cannot remember the the parameter.

If not, that could be some interesting enhancement:
Like the utilization counting "static" resource consumption, one could have a
dynamic resource consumption (counting semaphore-like) that is consumed while
an operation on an instance naming that resource is being performed.
So when you name your resource "concurrent_vm_ops" and asign that to every vm
configuration, eventually initalizing the resource to siome thing like 2 or 3,
then you could limit the concurrent VM invocations. Likewise, for less heave
instances you could use more relaxed settings or no restrictions at all...

Regards,
Ulrich

> 
> ++
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/