[ClusterLabs] SBD restarted the node while pacemaker in maintenance mode

Thu Dec 26 03:27:13 EST 2019

On 12/24/19 11:48 AM, Jerry Kross wrote:
> Hi,
> The pacemaker cluster manages a 2 node database cluster configured to use 3 
> iscsi disk targets in its stonith configuration. The pacemaker cluster was put 
> in maintenance mode but we see SBD writing to the system logs. And just after 
> these logs, the production node was restarted.
> Log:
> sbd[5955]:  warning: inquisitor_child: Latency: No liveness for 37 s exceeds 
> threshold of 36 s (healthy servants: 1)
> I see these messages logged and then the node was restarted. I suspect if it 
> was the softdog module that restarted the node but I don't see it in the logs. 

sbd is too critical to share the io path with others.

Very likely, the workload is too heavy, the iscsi connections are broken and 
sbd looses the access to the disks, then sbd use sysrq 'b' to reboot the node 
brutally and immediately.

In regarding to watchdog-reboot, it kicks in when sbd is not able to tickle it 
in time, eg. sbd starves for cpu, or is crashed. It is crucial too, but not 
likely the case here.

Merry X'mas and Happy New Year!
Roger