[ClusterLabs] Antw: Re: Antw: Re: Gracefully stop nodes one by one with disk-less sbd
Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 12 06:01:32 EDT 2019
>>> Roger Zhou <ZZhou at suse.com> schrieb am 12.08.2019 um 10:55 in Nachricht
<7249e013-1256-675a-3cea-3572f4615ee1 at suse.com>:
> On 8/12/19 2:48 PM, Ulrich Windl wrote:
>>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 09.08.2019 um 18:40
>> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f97af at gmail.com>:
>>> 09.08.2019 16:34, Yan Gao пишет:
>>> Lack of cluster wide shutdown mode was mentioned more than once on this
>>> list. I guess the only workaround is to use higher level tools which
>>> basically simply try to stop cluster on all nodes at once.
> I try to think of ssh/pssh to the involved nodes and stop diskless SBD
> daemons. However, SBD is not able to be teared down on it own. It is
> deeply tied up with pacemaker and corosync and has to be stop all
> together. Or, to hack SBD dependency otherwise.
>>> It is still
>>> susceptible to race condition.
>> Are there any concrete plans to implement a clean solution?
> I can think of Yet Another Feature to disable diskless SBD on-purpose.
> eg. to let SBD understands "stonith-enabled=false" at the cluster wide.
I imagine that some new mechanism would be needed to have non-persistent or
self-resetting attribute changes in the CIB:
For example if you do a "resource restart" and the node where the command runs
is fenced during the "stop" phase, the resource remains stopped until started
manually. This is because the "restart" is implemented as sequential non-atomic
"stop, then start".
Similar for a "cluster stop": There is a attribute "stop-all-resources"
(AFAIR). A "cluster stop" could temporarily set this to get all resources on
all nodes stopped. Then the pacemakers and corosyncs and sbds should stop. On
restart each node should start up normally...
BTW: HP-UX ServiceGuard had not only a command to stop the cluster, but also
one to start the cluster: I imagine that it could play nice with pacemaker as
well: The command would first start all the SBDs, corosyncs, and pacemakers,
and once the DC is selected, resources would start without needless shuffling
(migration) resources between nodes joining the cluster.
More information about the Users