[ClusterLabs] SBD restarted the node while pacemaker in maintenance mode

Thu Jan 2 05:35:32 EST 2020

On 12/26/19 9:27 AM, Roger Zhou wrote:
> On 12/24/19 11:48 AM, Jerry Kross wrote:
>> Hi,
>> The pacemaker cluster manages a 2 node database cluster configured to use 3 
>> iscsi disk targets in its stonith configuration. The pacemaker cluster was put 
>> in maintenance mode but we see SBD writing to the system logs. And just after 
>> these logs, the production node was restarted.
>> Log:
>> sbd[5955]:  warning: inquisitor_child: Latency: No liveness for 37 s exceeds 
>> threshold of 36 s (healthy servants: 1)
>> I see these messages logged and then the node was restarted. I suspect if it 
>> was the softdog module that restarted the node but I don't see it in the logs. 
Just to understand your config ...
You are using 3 block-devices with quorum amongst each other without
pacemaker-integration - right?
Might be that the disk-watchers are hanging on some io so that
we don't see any logs from them.
Did that happen just once or can you reproduce the issue?
If you are not using pacemaker-integration so far that might be a
way to increase reliability. (If it sees the other node sbd would be content
without getting response from the disks.) Of course it depends on your
distribution
and sbd-version if that would be supported with a 2-node-cluster
(or at all). sbd e.g. would have to have at least
https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377

Klaus 
> sbd is too critical to share the io path with others.
>
> Very likely, the workload is too heavy, the iscsi connections are broken and 
> sbd looses the access to the disks, then sbd use sysrq 'b' to reboot the node 
> brutally and immediately.
>
> In regarding to watchdog-reboot, it kicks in when sbd is not able to tickle it 
> in time, eg. sbd starves for cpu, or is crashed. It is crucial too, but not 
> likely the case here.
>
> Merry X'mas and Happy New Year!
> Roger
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/