[ClusterLabs] SBD restarted the node while pacemaker in maintenance mode
kwenning at redhat.com
Thu Jan 2 05:35:32 EST 2020
On 12/26/19 9:27 AM, Roger Zhou wrote:
> On 12/24/19 11:48 AM, Jerry Kross wrote:
>> The pacemaker cluster manages a 2 node database cluster configured to use 3
>> iscsi disk targets in its stonith configuration. The pacemaker cluster was put
>> in maintenance mode but we see SBD writing to the system logs. And just after
>> these logs, the production node was restarted.
>> sbd: warning: inquisitor_child: Latency: No liveness for 37 s exceeds
>> threshold of 36 s (healthy servants: 1)
>> I see these messages logged and then the node was restarted. I suspect if it
>> was the softdog module that restarted the node but I don't see it in the logs.
Just to understand your config ...
You are using 3 block-devices with quorum amongst each other without
pacemaker-integration - right?
Might be that the disk-watchers are hanging on some io so that
we don't see any logs from them.
Did that happen just once or can you reproduce the issue?
If you are not using pacemaker-integration so far that might be a
way to increase reliability. (If it sees the other node sbd would be content
without getting response from the disks.) Of course it depends on your
and sbd-version if that would be supported with a 2-node-cluster
(or at all). sbd e.g. would have to have at least
> sbd is too critical to share the io path with others.
> Very likely, the workload is too heavy, the iscsi connections are broken and
> sbd looses the access to the disks, then sbd use sysrq 'b' to reboot the node
> brutally and immediately.
> In regarding to watchdog-reboot, it kicks in when sbd is not able to tickle it
> in time, eg. sbd starves for cpu, or is crashed. It is crucial too, but not
> likely the case here.
> Merry X'mas and Happy New Year!
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users