[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Oct 7 02:14:20 EDT 2016


>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 06.10.2016 um 18:03 in
Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
> On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>> Hi All,
>>
>>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>>  
>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
> crmd 
>>> will reboot the node (unless the watchdog fails).
>>
>> Thank you for comment.
>>
>> We examine watchdog of crmd, too.
>> In addition, I comment after examination advanced.
> 
> Was thinking of doing a small test implementation going
> a little in the direction Lars Ellenberg had been pointing out.
> 
> a couple of thoughts I had so far:
> 
> - add an API (via DBus or libqb - favoring libqb atm) to sbd
>   an application can use to create a watchdog within sbd

Why has it to be done within sbd?

> 
> - parameters for the first are a name and a timeout
> 
> - first use-case would be crmd observation
> 
> - later on we could think of removing pacemaker dependencies
>   from sbd by moving the actual implementation of
>   pacemaker-watcher and probably cluster-watcher as well
>   into pacemaker - using the new API
> 
> - this of course creates sbd dependency within pacemaker so
>   that it would make sense to offer a simpler and self-contained
>   implementation within pacemaker as an alternative

I think the watchdog interface is so simple that you don't need a relay for it. The only limit I can imagine is the number of watchdogs available of some specific hardware.

> 
>   thus it would be favorable to have the dependency
>   within a non-compulsory pacemaker-rpm so that
>   we can offer an alternative that doesn't use sbd
>   at maybe the cost of being less reliable or one
>   that owns a hardware-watchdog by itself for systems
>   where this is still unused.
> 
>   - e.g. via some kind of plugin (Andrew forgive me -
>                                                    no pils ;-) )
>   - or via an additional daemon
> 
> What did you have in mind?
> Maybe it makes sense to synchronize...
> 
> Regards,
> Klaus
>  
>>
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> ----- Original Message -----
>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp 
>>> Cc: 
>>> Date: 2016/10/5, Wed 23:08
>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, 
> cluster decisions are delayed infinitely
>>>
>>>>>>  <renayama19661014 at ybb.ne.jp> schrieb am 21.09.2016 um 11:52 
>>> in Nachricht
>>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>  Hi All,
>>>>
>>>>  Was the final conclusion given about this problem?
>>>>
>>>>  If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
> crmd 
>>> will reboot the node (unless the watchdog fails).
>>>
>>>>  We are interested in this problem, too.
>>>>
>>>>  Best Regards,
>>>>
>>>>  Hideo Yamauchi.
>>>>
>>>>
>>>>  _______________________________________________
>>>>  Users mailing list: Users at clusterlabs.org 
>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>>
>>>>  Project Home: http://www.clusterlabs.org 
>>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>  Bugs: http://bugs.clusterlabs.org 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list