<div dir="ltr">Hi Klaus,<div>Wishing you a great 2020!</div><div>We're using 3 SBD disks with pacemaker integration. It just happened once and am able to reproduce the latency error messages in the system log by inducing a network delay in the VM that hosts the SBD disks. These are the only messages that were logged before the VM restarted.</div><div>From the SBD documentation,  <a href="https://www.mankier.com/8/sbd">https://www.mankier.com/8/sbd</a>., it says that having 1 SBD disk does not introduce a single point of failure. I also tested this configuration by offlining a disk and pacemaker worked just fine. From your experience, is it safe to run the cluster with one SBD disk? This is a 2 node Hana database cluster, where one is primary. The data is replicated using the native database tools. So, there's no shared DB storage and the chances of a split-brain scenario is less likely to occur. This is because, the secondary database does not accept any writes.</div><div>Regards,</div><div>JK</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 2, 2020 at 6:35 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 12/26/19 9:27 AM, Roger Zhou wrote:<br>

> On 12/24/19 11:48 AM, Jerry Kross wrote:<br>

>> Hi,<br>

>> The pacemaker cluster manages a 2 node database cluster configured to use 3 <br>

>> iscsi disk targets in its stonith configuration. The pacemaker cluster was put <br>

>> in maintenance mode but we see SBD writing to the system logs. And just after <br>

>> these logs, the production node was restarted.<br>

>> Log:<br>

>> sbd[5955]:  warning: inquisitor_child: Latency: No liveness for 37 s exceeds <br>

>> threshold of 36 s (healthy servants: 1)<br>

>> I see these messages logged and then the node was restarted. I suspect if it <br>

>> was the softdog module that restarted the node but I don't see it in the logs. <br>

Just to understand your config ...<br>

You are using 3 block-devices with quorum amongst each other without<br>

pacemaker-integration - right?<br>

Might be that the disk-watchers are hanging on some io so that<br>

we don't see any logs from them.<br>

Did that happen just once or can you reproduce the issue?<br>

If you are not using pacemaker-integration so far that might be a<br>

way to increase reliability. (If it sees the other node sbd would be content<br>

without getting response from the disks.) Of course it depends on your<br>

distribution<br>

and sbd-version if that would be supported with a 2-node-cluster<br>

(or at all). sbd e.g. would have to have at least<br>

<a href="https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377" rel="noreferrer" target="_blank">https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377</a><br>

<br>

Klaus <br>

> sbd is too critical to share the io path with others.<br>

><br>

> Very likely, the workload is too heavy, the iscsi connections are broken and <br>

> sbd looses the access to the disks, then sbd use sysrq 'b' to reboot the node <br>

> brutally and immediately.<br>

><br>

> In regarding to watchdog-reboot, it kicks in when sbd is not able to tickle it <br>

> in time, eg. sbd starves for cpu, or is crashed. It is crucial too, but not <br>

> likely the case here.<br>

><br>

> Merry X'mas and Happy New Year!<br>

> Roger<br>

><br>

> _______________________________________________<br>

> Manage your subscription:<br>

> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

><br>

> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

<br>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a></blockquote></div>