[ClusterLabs] Antw: Re: SBD as watchdog daemon

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Apr 16 01:55:10 EDT 2019


>>> ???? ???????? <splarv at ya.ru> schrieb am 15.04.2019 um 13:03 in Nachricht
<566FE1CD-B8FD-41E0-BC07-1722BE14E8B6 at ya.ru>:

> 
>> 14 апр. 2019 г., в 10:12, Andrei Borzenkov <arvidjaar at gmail.com>
написал(а):
> 
> Thanks for explanation, I think this will be good addition to the SBD 
> manual. (SBD manual need in this.) But my problem lies in other plain.
> 
> I investigated SBD. A common watchdog is a much simple. One infinite loop, 
> checks some tests and write to the watchdog device. Any mistakes, freeze or

> segfault and watchdog will fire. But SBD has another design. First of all 
> there is not one infinite loop. There are three different processes, one is

> «inquisitor" and to other «servants» for corosync and pacemaker. And there
is 
> complex logic to check each other inside SBD. But the problem even is not 
> here. Both the servants send to the inquisitor health heartbeat every
second. 
> But… They send health heartbeat not as result of checking corosync or 
> pacemaker, as expected to be, but from the internal buffer variable 
> «servant_health». And if corosync or pacemaker is frozen (can be emulated by

> `kill -s STOP`), this variable is never changed and the servants continue 
> send to the inquisitor a good health status always. And this is a bug. I am

> looking a way to fix this.

I had a similar design, where a monitor updated some measurements in shared
value, and another thread was reading them. The design idea was to avoid
blocking the read. But if the monitor block, you read old data all the time. So
I simply added a modulo-16 count that is incremented whenever the data was
updated. So if the counter does not change, there's a 15/16 probability that
the data wasn't actually updated. As everything was performance- critical, I
did no use the luxury of writing a stime-stamp into shared memory... ;-)

Regards,
Ulrich


> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list