[ClusterLabs] SBD restarted the node while pacemaker in maintenance mode

Wed Jan 8 04:30:28 EST 2020

On 1/8/20 9:28 AM, Jerry Kross wrote:
> Thanks Klaus. Yes, I was able to reproduce the latency messages by
> inducing a network delay in the SBD VM and the node did not reboot.
> We also had a production issue where the primary node of a 2 node
> cluster was fenced when the primary node lost connectivity to 2 out of
> the 3 SBD disks. The error message is "Warning: inquisitor_child
> requested a reset"
Did the 2 cluster nodes loose connectivity to each other as well
simultaneously?
> The SBD configuration is integrated with the pacemaker cluster.The
> reboot would have happened
Just to assure we are talking of the same thing: When talking
about pacemaker integration I mean the '-P' option (default and
if given a 2nd time this means turn off - check presence of
'sbd: watcher: Pacemaker' & 'sbd: watcher: Cluster' sub-daemons -
and corosync.conf: quorum { ... two_node: 1 ...} of course in your
case to tell sbd it should rather count nodes instead of relying
on quorum).
> because of 2 events: 1) access was lost to 3 SBD disks , 2) Pacemaker
> regarded this node as
1) shouldn't trigger a reboot by itself as long as the nodes see each
other while 2) would of course trigger self-fencing.
> unhealthy (although this is not clear from the logs) But the
> triggering point was the loss of connectivity and am not sure if
> pacemaker regarded this node as unhealthy because the node lost
> connectivity to the 2 SBD disks.
Loosing 2 out of 3 disks should impose the same behavior as
loosing 1 disk in a single-disk setup.

reminding me to add test-case(s) to CI that verify the
disk-quorum behavior ;-)
> In such a scenario, Having 1 SBD device would be sufficient?
As already said with pacemaker-integration - principally yes.
Unless you have e.g. a setup with 3 disks at 3 sites and
2 nodes at 2 of these sites where you still want to provide
service while entirely loosing one of the node-sites.

To further assure we are on the same page some more
info about distribution, version/origin of sbd & pacemaker,
sbd & corosync config might be helpful.

Klaus
>
> Regards,
> JK
>
> On Tue, Jan 7, 2020 at 6:20 PM Klaus Wenninger <kwenning at redhat.com
> <mailto:kwenning at redhat.com>> wrote:
>
>     On 1/6/20 8:40 AM, Jerry Kross wrote:
>>     Hi Klaus,
>>     Wishing you a great 2020!
>     Same to you!
>>     We're using 3 SBD disks with pacemaker integration. It just
>>     happened once and am able to reproduce the latency error messages
>>     in the system log by inducing a network delay in the VM that
>>     hosts the SBD disks. These are the only messages that were logged
>>     before the VM restarted.
>     You mean you can reproduce the latency messages but they don't
>     trigger a reboot - right?
>>     From the SBD documentation,  https://www.mankier.com/8/sbd., it
>>     says that having 1 SBD disk does not introduce a single point of
>>     failure. I also tested this configuration by offlining a disk and
>>     pacemaker worked just fine. From your experience, is it safe to
>>     run the cluster with one SBD disk? This is a 2 node Hana database
>>     cluster, where one is primary. The data is replicated using the
>>     native database tools. So, there's no shared DB storage and the
>>     chances of a split-brain scenario is less likely to occur. This
>>     is because, the secondary database does not accept any writes.
>     When setup properly so that a node reboots if it looses
>     its pacemaker-partner and the disk at the same time a 2-node
>     cluster with SBD and a single disk should be safe to operate.
>     As you already pointed out the disk isn't a SPOF as a node will
>     still provide service as long as it sees the partner.
>     Stating the obvious: Using just a single disk with pacemaker
>     integration isn't raising the risk of split-brain but rather
>     raises the risk of an unneeded node-reboot. So if your setup
>     is likely to e.g. loose the connection between the
>     partner-nodes and that to the disk simultaneously it may
>     be interesting to have something like 3 disks a 3 sites or
>     step away from 2-node-config in corosync in favor of real
>     quorum using qdevice.
>     I'm not very familiar with Hana-specific issue though.
>
>     Klaus
>>     Regards,
>>     JK
>>
>>
>>     On Thu, Jan 2, 2020 at 6:35 PM Klaus Wenninger
>>     <kwenning at redhat.com <mailto:kwenning at redhat.com>> wrote:
>>
>>         On 12/26/19 9:27 AM, Roger Zhou wrote:
>>         > On 12/24/19 11:48 AM, Jerry Kross wrote:
>>         >> Hi,
>>         >> The pacemaker cluster manages a 2 node database cluster
>>         configured to use 3
>>         >> iscsi disk targets in its stonith configuration. The
>>         pacemaker cluster was put
>>         >> in maintenance mode but we see SBD writing to the system
>>         logs. And just after
>>         >> these logs, the production node was restarted.
>>         >> Log:
>>         >> sbd[5955]:  warning: inquisitor_child: Latency: No
>>         liveness for 37 s exceeds
>>         >> threshold of 36 s (healthy servants: 1)
>>         >> I see these messages logged and then the node was
>>         restarted. I suspect if it
>>         >> was the softdog module that restarted the node but I don't
>>         see it in the logs.
>>         Just to understand your config ...
>>         You are using 3 block-devices with quorum amongst each other
>>         without
>>         pacemaker-integration - right?
>>         Might be that the disk-watchers are hanging on some io so that
>>         we don't see any logs from them.
>>         Did that happen just once or can you reproduce the issue?
>>         If you are not using pacemaker-integration so far that might be a
>>         way to increase reliability. (If it sees the other node sbd
>>         would be content
>>         without getting response from the disks.) Of course it
>>         depends on your
>>         distribution
>>         and sbd-version if that would be supported with a 2-node-cluster
>>         (or at all). sbd e.g. would have to have at least
>>         https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377
>>
>>         Klaus 
>>         > sbd is too critical to share the io path with others.
>>         >
>>         > Very likely, the workload is too heavy, the iscsi
>>         connections are broken and
>>         > sbd looses the access to the disks, then sbd use sysrq 'b'
>>         to reboot the node
>>         > brutally and immediately.
>>         >
>>         > In regarding to watchdog-reboot, it kicks in when sbd is
>>         not able to tickle it
>>         > in time, eg. sbd starves for cpu, or is crashed. It is
>>         crucial too, but not
>>         > likely the case here.
>>         >
>>         > Merry X'mas and Happy New Year!
>>         > Roger
>>         >
>>         > _______________________________________________
>>         > Manage your subscription:
>>         > https://lists.clusterlabs.org/mailman/listinfo/users
>>         >
>>         > ClusterLabs home: https://www.clusterlabs.org/
>>
>>         _______________________________________________
>>         Manage your subscription:
>>         https://lists.clusterlabs.org/mailman/listinfo/users
>>
>>         ClusterLabs home: https://www.clusterlabs.org/
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20200108/b98b53fd/attachment-0001.html>