<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 8/9/19 9:06 PM, Yan Gao wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:ef9f2e79-80ba-d86c-41e4-394ca8084557@suse.com">

      <pre class="moz-quote-pre" wrap="">On 8/9/19 6:40 PM, Andrei Borzenkov wrote:

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">09.08.2019 16:34, Yan Gao пишет:

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">Hi,

With disk-less sbd,  it's fine to stop cluster service from the cluster

nodes all at the same time.

But if to stop the nodes one by one, for example with a 3-node cluster,

after stopping the 2nd node, the only remaining node resets itself with:

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">

That is sort of documented in SBD manual page:

--><--

However, while the cluster is in such a degraded state, it can

neither successfully fence nor be shutdown cleanly (as taking the

cluster below the quorum threshold will immediately cause all remaining

nodes to self-fence).

--><--

SBD in shared-nothing mode is basically always in such degraded state

and cannot tolerate loss of quorum.

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">Well, the context here is it loses quorum *expectedly* since the other 

nodes gracefully shut down.

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">Aug 09 14:30:20 opensuse150-1 sbd[1079]:       pcmk:    debug:

notify_parent: Not notifying parent: state transient (2)

Aug 09 14:30:20 opensuse150-1 sbd[1080]:    cluster:    debug:

notify_parent: Notifying parent: healthy

Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:

Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)

I can think of the way to manipulate quorum with last_man_standing and

potentially also auto_tie_breaker, not to mention

last_man_standing_window would also be a factor... But is there a better

solution?

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">

Lack of cluster wide shutdown mode was mentioned more than once on this

list. I guess the only workaround is to use higher level tools which

basically simply try to stop cluster on all nodes at once. It is still

susceptible to race condition.

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">Gracefully stopping nodes one by one on purpose is still a reasonable 

need though ...</pre>

    </blockquote>

    <tt>If you do the teardown as e.g. pcs is doing it - first tear down</tt><tt><br>

    </tt><tt>pacemaker-instances and then corosync/sbd - it is at</tt><tt><br>

    </tt><tt>least possible to tear down the pacemaker-instances one-by

      one</tt><tt><br>

    </tt><tt>without risking a reboot due to quorum-loss.</tt><tt><br>

    </tt><tt>With kind of current sbd having in</tt><tt><br>

    </tt><tt><a

href="https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945">-

https://github.com/ClusterLabs/sbd/commit/824fe834c67fb7bae7feb87607381f9fa8fa2945</a></tt><tt><br>

    </tt><tt>- </tt><tt><a

href="https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70">https://github.com/ClusterLabs/sbd/commit/79b778debfee5b4ab2d099b2bfc7385f45597f70</a></tt><tt><br>

    </tt><tt>- </tt><tt><a

href="https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68">https://github.com/ClusterLabs/sbd/commit/a716a8ddd3df615009bcff3bd96dd9ae64cb5f68</a></tt><tt><br>

    </tt><tt>this should be pretty robust although we are still thinking</tt><tt><br>

    </tt><tt>(probably together with some heartbeat to pacemakerd</tt><tt><br>

    </tt><tt>that assures pacemakerd is checking liveness of sub-daemons</tt><tt><br>

    </tt><tt>properly) of having a cleaner way to detect graceful</tt><tt><br>

    </tt><tt>pacemaker-shutdown.</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Klaus</tt><br>

    <blockquote type="cite"

      cite="mid:ef9f2e79-80ba-d86c-41e4-394ca8084557@suse.com">

      <pre class="moz-quote-pre" wrap="">

Regards,

   Yan

_______________________________________________

Manage your subscription:

<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a></pre>

    </blockquote>

    <br>

  </body>

</html>