[ClusterLabs] SBD as watchdog daemon

Mon Apr 15 07:03:21 EDT 2019

> 14 апр. 2019 г., в 10:12, Andrei Borzenkov <arvidjaar at gmail.com> написал(а):

Thanks for explanation, I think this will be good addition to the SBD manual. (SBD manual need in this.) But my problem lies in other plain.

I investigated SBD. A common watchdog is a much simple. One infinite loop, checks some tests and write to the watchdog device. Any mistakes, freeze or segfault and watchdog will fire. But SBD has another design. First of all there is not one infinite loop. There are three different processes, one is «inquisitor" and to other «servants» for corosync and pacemaker. And there is complex logic to check each other inside SBD. But the problem even is not here. Both the servants send to the inquisitor health heartbeat every second. But… They send health heartbeat not as result of checking corosync or pacemaker, as expected to be, but from the internal buffer variable «servant_health». And if corosync or pacemaker is frozen (can be emulated by `kill -s STOP`), this variable is never changed and the servants continue send to the inquisitor a good health status always. And this is a bug. I am looking a way to fix this.