[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
Klaus Wenninger
kwenning at redhat.com
Mon Oct 10 08:03:13 EDT 2016
On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
> Our user may not necessarily use sdb.
>
> I confirmed that there was a method using WD service of corosync as one method not to use sdb.
> Pacemaker watches the process of pacemaker by WD service using CMAP and can carry out watchdog.
Have to have a look at that...
But if we establish some in-between-layer in pacemaker we could have this
as one of the possibilities besides e.g. sbd (with enhanced API), going for
a watchdog-device directly, ...
>
>
> We can set up a patch of pacemaker.
Always helpful to discuss/clarify an idea once some code is available ...
> Was the discussion of using WD service over so far?
Not from my pov. Just a day off ;-)
>
>
> Best Regard,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>; users at clusterlabs.org
>> Cc:
>> Date: 2016/10/7, Fri 17:47
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>>
>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am
>> 06.10.2016 um 18:03 in
>>> Nachricht <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>> On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>> Hi All,
>>>>>
>>>>>>> If a user uses sbd, can the cluster evade a problem of
>> SIGSTOP of crmd?
>>>>>>
>>>>>> As pointed out earlier, maybe crmd should feed a watchdog. Then
>> stopping
>>>> crmd
>>>>>> will reboot the node (unless the watchdog fails).
>>>>> Thank you for comment.
>>>>>
>>>>> We examine watchdog of crmd, too.
>>>>> In addition, I comment after examination advanced.
>>>> Was thinking of doing a small test implementation going
>>>> a little in the direction Lars Ellenberg had been pointing out.
>>>>
>>>> a couple of thoughts I had so far:
>>>>
>>>> - add an API (via DBus or libqb - favoring libqb atm) to sbd
>>>> an application can use to create a watchdog within sbd
>>> Why has it to be done within sbd?
>> Not necessarily, could be spawned out as well into an own project or
>> something already existent could be taken.
>> Remember to have added a dbus-interface to
>> https://sourceforge.net/projects/watchdog/ for a project once.
>> If you have a suggestion I'm open.
>> Going off sbd would have the advantage of a smooth start:
>>
>> - cluster/pacemaker-watcher are there already and can
>> be replaced/moved over time
>> - the lifecycle of the daemon (when started/stopped) is
>> already something that is in the code and in the people's minds
>>
>>>> - parameters for the first are a name and a timeout
>>>>
>>>> - first use-case would be crmd observation
>>>>
>>>> - later on we could think of removing pacemaker dependencies
>>>> from sbd by moving the actual implementation of
>>>> pacemaker-watcher and probably cluster-watcher as well
>>>> into pacemaker - using the new API
>>>>
>>>> - this of course creates sbd dependency within pacemaker so
>>>> that it would make sense to offer a simpler and self-contained
>>>> implementation within pacemaker as an alternative
>>> I think the watchdog interface is so simple that you don't need a relay
>> for it. The only limit I can imagine is the number of watchdogs available of
>> some specific hardware.
>> That is the point ;-)
>>>> thus it would be favorable to have the dependency
>>>> within a non-compulsory pacemaker-rpm so that
>>>> we can offer an alternative that doesn't use sbd
>>>> at maybe the cost of being less reliable or one
>>>> that owns a hardware-watchdog by itself for systems
>>>> where this is still unused.
>>>>
>>>> - e.g. via some kind of plugin (Andrew forgive me -
>>>> no pils ;-) )
>>>> - or via an additional daemon
>>>>
>>>> What did you have in mind?
>>>> Maybe it makes sense to synchronize...
>>>>
>>>> Regards,
>>>> Klaus
>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>>>>>> Cc:
>>>>>> Date: 2016/10/5, Wed 23:08
>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When the DC crmd is
>> frozen,
>>>> cluster decisions are delayed infinitely
>>>>>>>>> <renayama19661014 at ybb.ne.jp> schrieb am
>> 21.09.2016 um 11:52
>>>>>> in Nachricht
>>>>>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Was the final conclusion given about this problem?
>>>>>>>
>>>>>>> If a user uses sbd, can the cluster evade a problem of
>> SIGSTOP of crmd?
>>>>>> As pointed out earlier, maybe crmd should feed a watchdog. Then
>> stopping
>>>> crmd
>>>>>> will reboot the node (unless the watchdog fails).
>>>>>>
>>>>>>> We are interested in this problem, too.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list