[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Fri Oct 7 17:10:30 EDT 2016

Hi All,

Our user may not necessarily use sdb.

I confirmed that there was a method using WD service of corosync as one method not to use sdb.
Pacemaker watches the process of pacemaker by WD service using CMAP and can carry out watchdog.

We can set up a patch of pacemaker.

Was the discussion of using WD service over so far?

Best Regard,
Hideo Yamauchi.

----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>; users at clusterlabs.org
> Cc: 
> Date: 2016/10/7, Fri 17:47
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
> 
> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>  Klaus Wenninger <kwenning at redhat.com> schrieb am 
> 06.10.2016 um 18:03 in
>>  Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>  On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>  Hi All,
>>>> 
>>>>>>  If a user uses sbd, can the cluster evade a problem of 
> SIGSTOP of crmd?
>>>>>   
>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
> stopping 
>>>  crmd 
>>>>>  will reboot the node (unless the watchdog fails).
>>>>  Thank you for comment.
>>>> 
>>>>  We examine watchdog of crmd, too.
>>>>  In addition, I comment after examination advanced.
>>>  Was thinking of doing a small test implementation going
>>>  a little in the direction Lars Ellenberg had been pointing out.
>>> 
>>>  a couple of thoughts I had so far:
>>> 
>>>  - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>    an application can use to create a watchdog within sbd
>>  Why has it to be done within sbd?
> Not necessarily, could be spawned out as well into an own project or
> something already existent could be taken.
> Remember to have added a dbus-interface to
> https://sourceforge.net/projects/watchdog/ for a project once.
> If you have a suggestion I'm open.
> Going off sbd would have the advantage of a smooth start:
> 
> - cluster/pacemaker-watcher are there already and can
>   be replaced/moved over time
> - the lifecycle of the daemon (when started/stopped) is
>   already something that is in the code and in the people's minds
> 
>>>  - parameters for the first are a name and a timeout
>>> 
>>>  - first use-case would be crmd observation
>>> 
>>>  - later on we could think of removing pacemaker dependencies
>>>    from sbd by moving the actual implementation of
>>>    pacemaker-watcher and probably cluster-watcher as well
>>>    into pacemaker - using the new API
>>> 
>>>  - this of course creates sbd dependency within pacemaker so
>>>    that it would make sense to offer a simpler and self-contained
>>>    implementation within pacemaker as an alternative
>>  I think the watchdog interface is so simple that you don't need a relay 
> for it. The only limit I can imagine is the number of watchdogs available of 
> some specific hardware.
> That is the point ;-)
>>>    thus it would be favorable to have the dependency
>>>    within a non-compulsory pacemaker-rpm so that
>>>    we can offer an alternative that doesn't use sbd
>>>    at maybe the cost of being less reliable or one
>>>    that owns a hardware-watchdog by itself for systems
>>>    where this is still unused.
>>> 
>>>    - e.g. via some kind of plugin (Andrew forgive me -
>>>                                                     no pils ;-) )
>>>    - or via an additional daemon
>>> 
>>>  What did you have in mind?
>>>  Maybe it makes sense to synchronize...
>>> 
>>>  Regards,
>>>  Klaus
>>>   
>>>> 
>>>>  Best Regards,
>>>>  Hideo Yamauchi.
>>>> 
>>>> 
>>>> 
>>>>  ----- Original Message -----
>>>>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>  To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp 
>>>>>  Cc: 
>>>>>  Date: 2016/10/5, Wed 23:08
>>>>>  Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is 
> frozen, 
>>>  cluster decisions are delayed infinitely
>>>>>>>>   <renayama19661014 at ybb.ne.jp> schrieb am 
> 21.09.2016 um 11:52 
>>>>>  in Nachricht
>>>>>  <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>   Hi All,
>>>>>> 
>>>>>>   Was the final conclusion given about this problem?
>>>>>> 
>>>>>>   If a user uses sbd, can the cluster evade a problem of 
> SIGSTOP of crmd?
>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
> stopping 
>>>  crmd 
>>>>>  will reboot the node (unless the watchdog fails).
>>>>> 
>>>>>>   We are interested in this problem, too.
>>>>>> 
>>>>>>   Best Regards,
>>>>>> 
>>>>>>   Hideo Yamauchi.
>>>>>> 
>>>>>> 
>>>>>>   _______________________________________________
>>>>>>   Users mailing list: Users at clusterlabs.org 
>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>>  _______________________________________________
>>>>  Users mailing list: Users at clusterlabs.org 
>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>>  Project Home: http://www.clusterlabs.org 
>>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>  Bugs: http://bugs.clusterlabs.org 
>>> 
>>> 
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org 
>>>  http://clusterlabs.org/mailman/listinfo/users 
>>> 
>>>  Project Home: http://www.clusterlabs.org 
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  Bugs: http://bugs.clusterlabs.org 
>> 
>> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>