[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Klaus Wenninger kwenning at redhat.com
Fri Oct 7 08:47:04 UTC 2016


On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 06.10.2016 um 18:03 in
> Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>> On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi All,
>>>
>>>>> If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>>>  
>>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
>> crmd 
>>>> will reboot the node (unless the watchdog fails).
>>> Thank you for comment.
>>>
>>> We examine watchdog of crmd, too.
>>> In addition, I comment after examination advanced.
>> Was thinking of doing a small test implementation going
>> a little in the direction Lars Ellenberg had been pointing out.
>>
>> a couple of thoughts I had so far:
>>
>> - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>   an application can use to create a watchdog within sbd
> Why has it to be done within sbd?
Not necessarily, could be spawned out as well into an own project or
something already existent could be taken.
Remember to have added a dbus-interface to
https://sourceforge.net/projects/watchdog/ for a project once.
If you have a suggestion I'm open.
Going off sbd would have the advantage of a smooth start:

- cluster/pacemaker-watcher are there already and can
  be replaced/moved over time
- the lifecycle of the daemon (when started/stopped) is
  already something that is in the code and in the people's minds

>> - parameters for the first are a name and a timeout
>>
>> - first use-case would be crmd observation
>>
>> - later on we could think of removing pacemaker dependencies
>>   from sbd by moving the actual implementation of
>>   pacemaker-watcher and probably cluster-watcher as well
>>   into pacemaker - using the new API
>>
>> - this of course creates sbd dependency within pacemaker so
>>   that it would make sense to offer a simpler and self-contained
>>   implementation within pacemaker as an alternative
> I think the watchdog interface is so simple that you don't need a relay for it. The only limit I can imagine is the number of watchdogs available of some specific hardware.
That is the point ;-)
>>   thus it would be favorable to have the dependency
>>   within a non-compulsory pacemaker-rpm so that
>>   we can offer an alternative that doesn't use sbd
>>   at maybe the cost of being less reliable or one
>>   that owns a hardware-watchdog by itself for systems
>>   where this is still unused.
>>
>>   - e.g. via some kind of plugin (Andrew forgive me -
>>                                                    no pils ;-) )
>>   - or via an additional daemon
>>
>> What did you have in mind?
>> Maybe it makes sense to synchronize...
>>
>> Regards,
>> Klaus
>>  
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp 
>>>> Cc: 
>>>> Date: 2016/10/5, Wed 23:08
>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, 
>> cluster decisions are delayed infinitely
>>>>>>>  <renayama19661014 at ybb.ne.jp> schrieb am 21.09.2016 um 11:52 
>>>> in Nachricht
>>>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>  Hi All,
>>>>>
>>>>>  Was the final conclusion given about this problem?
>>>>>
>>>>>  If a user uses sbd, can the cluster evade a problem of SIGSTOP of crmd?
>>>> As pointed out earlier, maybe crmd should feed a watchdog. Then stopping 
>> crmd 
>>>> will reboot the node (unless the watchdog fails).
>>>>
>>>>>  We are interested in this problem, too.
>>>>>
>>>>>  Best Regards,
>>>>>
>>>>>  Hideo Yamauchi.
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>>  Users mailing list: Users at clusterlabs.org 
>>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>>>
>>>>>  Project Home: http://www.clusterlabs.org 
>>>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>  Bugs: http://bugs.clusterlabs.org 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>





More information about the Users mailing list