<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 8/9/19 9:06 PM, Yan Gao wrote:<br>
</div>
<blockquote type="cite"
cite="mid:ef9f2e79-80ba-d86c-41e4-394ca8084557@suse.com">
<pre class="moz-quote-pre" wrap="">On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">09.08.2019 16:34, Yan Gao пишет:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi,
With disk-less sbd, it's fine to stop cluster service from the cluster
nodes all at the same time.
But if to stop the nodes one by one, for example with a 3-node cluster,
after stopping the 2nd node, the only remaining node resets itself with:
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
That is sort of documented in SBD manual page:
--><--
However, while the cluster is in such a degraded state, it can
neither successfully fence nor be shutdown cleanly (as taking the
cluster below the quorum threshold will immediately cause all remaining
nodes to self-fence).
--><--
SBD in shared-nothing mode is basically always in such degraded state
and cannot tolerate loss of quorum.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Well, the context here is it loses quorum *expectedly* since the other
nodes gracefully shut down.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Aug 09 14:30:20 opensuse150-1 sbd[1079]: pcmk: debug:
notify_parent: Not notifying parent: state transient (2)
Aug 09 14:30:20 opensuse150-1 sbd[1080]: cluster: debug:
notify_parent: Notifying parent: healthy
Aug 09 14:30:20 opensuse150-1 sbd[1078]: warning: inquisitor_child:
Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
I can think of the way to manipulate quorum with last_man_standing and
potentially also auto_tie_breaker, not to mention
last_man_standing_window would also be a factor... But is there a better
solution?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Lack of cluster wide shutdown mode was mentioned more than once on this
list. I guess the only workaround is to use higher level tools which
basically simply try to stop cluster on all nodes at once. It is still
susceptible to race condition.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Gracefully stopping nodes one by one on purpose is still a reasonable
need though ...</pre>
</blockquote>
<tt>If you do the teardown as e.g. pcs is doing it - first tear down</tt><tt><br>
</tt><tt>pacemaker-instances and then corosync/sbd - it is at</tt><tt><br>
</tt><tt>least possible to tear down the pacemaker-instances one-by
one</tt><tt><br>
</tt><tt>without risking a reboot due to quorum-loss.</tt><tt><br>
</tt><tt>With kind of current sbd having in</tt><tt><br>
</tt><tt><a
href="https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945">-
https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945</a></tt><tt><br>
</tt><tt>- </tt><tt><a
href="https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70">https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70</a></tt><tt><br>
</tt><tt>- </tt><tt><a
href="https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68">https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68</a></tt><tt><br>
</tt><tt>this should be pretty robust although we are still thinking</tt><tt><br>
</tt><tt>(probably together with some heartbeat to pacemakerd</tt><tt><br>
</tt><tt>that assures pacemakerd is checking liveness of sub-daemons</tt><tt><br>
</tt><tt>properly) of having a cleaner way to detect graceful</tt><tt><br>
</tt><tt>pacemaker-shutdown.</tt><tt><br>
</tt><tt><br>
</tt><tt>Klaus</tt><br>
<blockquote type="cite"
cite="mid:ef9f2e79-80ba-d86c-41e4-394ca8084557@suse.com">
<pre class="moz-quote-pre" wrap="">
Regards,
Yan
_______________________________________________
Manage your subscription:
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>
ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a></pre>
</blockquote>
<br>
</body>
</html>