[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Mon Oct 10 12:03:13 UTC 2016

On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
> Our user may not necessarily use sdb.
>
> I confirmed that there was a method using WD service of corosync as one method not to use sdb.
> Pacemaker watches the process of pacemaker by WD service using CMAP and can carry out watchdog.

Have to have a look at that...
But if we establish some in-between-layer in pacemaker we could have this
as one of the possibilities besides e.g. sbd (with enhanced API), going for
a watchdog-device directly, ...

>
>
> We can set up a patch of pacemaker.

Always helpful to discuss/clarify an idea once some code is available ...

> Was the discussion of using WD service over so far?

Not from my pov. Just a day off ;-)

>
>
> Best Regard,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>; users at clusterlabs.org
>> Cc: 
>> Date: 2016/10/7, Fri 17:47
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>>
>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>  Klaus Wenninger <kwenning at redhat.com> schrieb am 
>> 06.10.2016 um 18:03 in
>>>  Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>  On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>  Hi All,
>>>>>
>>>>>>>  If a user uses sbd, can the cluster evade a problem of 
>> SIGSTOP of crmd?
>>>>>>   
>>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
>> stopping 
>>>>  crmd 
>>>>>>  will reboot the node (unless the watchdog fails).
>>>>>  Thank you for comment.
>>>>>
>>>>>  We examine watchdog of crmd, too.
>>>>>  In addition, I comment after examination advanced.
>>>>  Was thinking of doing a small test implementation going
>>>>  a little in the direction Lars Ellenberg had been pointing out.
>>>>
>>>>  a couple of thoughts I had so far:
>>>>
>>>>  - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>>    an application can use to create a watchdog within sbd
>>>  Why has it to be done within sbd?
>> Not necessarily, could be spawned out as well into an own project or
>> something already existent could be taken.
>> Remember to have added a dbus-interface to
>> https://sourceforge.net/projects/watchdog/ for a project once.
>> If you have a suggestion I'm open.
>> Going off sbd would have the advantage of a smooth start:
>>
>> - cluster/pacemaker-watcher are there already and can
>>   be replaced/moved over time
>> - the lifecycle of the daemon (when started/stopped) is
>>   already something that is in the code and in the people's minds
>>
>>>>  - parameters for the first are a name and a timeout
>>>>
>>>>  - first use-case would be crmd observation
>>>>
>>>>  - later on we could think of removing pacemaker dependencies
>>>>    from sbd by moving the actual implementation of
>>>>    pacemaker-watcher and probably cluster-watcher as well
>>>>    into pacemaker - using the new API
>>>>
>>>>  - this of course creates sbd dependency within pacemaker so
>>>>    that it would make sense to offer a simpler and self-contained
>>>>    implementation within pacemaker as an alternative
>>>  I think the watchdog interface is so simple that you don't need a relay 
>> for it. The only limit I can imagine is the number of watchdogs available of 
>> some specific hardware.
>> That is the point ;-)
>>>>    thus it would be favorable to have the dependency
>>>>    within a non-compulsory pacemaker-rpm so that
>>>>    we can offer an alternative that doesn't use sbd
>>>>    at maybe the cost of being less reliable or one
>>>>    that owns a hardware-watchdog by itself for systems
>>>>    where this is still unused.
>>>>
>>>>    - e.g. via some kind of plugin (Andrew forgive me -
>>>>                                                     no pils ;-) )
>>>>    - or via an additional daemon
>>>>
>>>>  What did you have in mind?
>>>>  Maybe it makes sense to synchronize...
>>>>
>>>>  Regards,
>>>>  Klaus
>>>>   
>>>>>  Best Regards,
>>>>>  Hideo Yamauchi.
>>>>>
>>>>>
>>>>>
>>>>>  ----- Original Message -----
>>>>>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>  To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp 
>>>>>>  Cc: 
>>>>>>  Date: 2016/10/5, Wed 23:08
>>>>>>  Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is 
>> frozen, 
>>>>  cluster decisions are delayed infinitely
>>>>>>>>>   <renayama19661014 at ybb.ne.jp> schrieb am 
>> 21.09.2016 um 11:52 
>>>>>>  in Nachricht
>>>>>>  <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>   Hi All,
>>>>>>>
>>>>>>>   Was the final conclusion given about this problem?
>>>>>>>
>>>>>>>   If a user uses sbd, can the cluster evade a problem of 
>> SIGSTOP of crmd?
>>>>>>  As pointed out earlier, maybe crmd should feed a watchdog. Then 
>> stopping 
>>>>  crmd 
>>>>>>  will reboot the node (unless the watchdog fails).
>>>>>>
>>>>>>>   We are interested in this problem, too.
>>>>>>>
>>>>>>>   Best Regards,
>>>>>>>
>>>>>>>   Hideo Yamauchi.
>>>>>>>
>>>>>>>
>>>>>>>   _______________________________________________
>>>>>>>   Users mailing list: Users at clusterlabs.org 
>>>>>>>   http://clusterlabs.org/mailman/listinfo/users 
>>>>>>>
>>>>>>>   Project Home: http://www.clusterlabs.org 
>>>>>>>   Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>>>   Bugs: http://bugs.clusterlabs.org 
>>>>>  _______________________________________________
>>>>>  Users mailing list: Users at clusterlabs.org 
>>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>>>
>>>>>  Project Home: http://www.clusterlabs.org 
>>>>>  Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>>  Bugs: http://bugs.clusterlabs.org 
>>>>
>>>>  _______________________________________________
>>>>  Users mailing list: Users at clusterlabs.org 
>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>>
>>>>  Project Home: http://www.clusterlabs.org 
>>>>  Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>>  Bugs: http://bugs.clusterlabs.org 
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org