[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Fri Oct 14 09:21:55 UTC 2016
Hi Klaus,
Hi All,
I tried prototype of watchdog using WD service.
- https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
Please comment.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: "users at clusterlabs.org" <users at clusterlabs.org>
> Cc:
> Date: 2016/10/11, Tue 17:58
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>
> Hi Klaus,
>
> Thank you for comment.
>
> I make the patch which is prototype using WD service.
>
> Please wait a little.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
>
> ----- Original Message -----
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: users at clusterlabs.org
>> Cc:
>> Date: 2016/10/10, Mon 21:03
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd
> is frozen, cluster decisions are delayed infinitely
>>
>> On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi All,
>>>
>>> Our user may not necessarily use sdb.
>>>
>>> I confirmed that there was a method using WD service of corosync as
> one
>> method not to use sdb.
>>> Pacemaker watches the process of pacemaker by WD service using CMAP
> and can
>> carry out watchdog.
>>
>> Have to have a look at that...
>> But if we establish some in-between-layer in pacemaker we could have this
>> as one of the possibilities besides e.g. sbd (with enhanced API), going for
>> a watchdog-device directly, ...
>>
>>>
>>>
>>> We can set up a patch of pacemaker.
>>
>> Always helpful to discuss/clarify an idea once some code is available ...
>>
>>> Was the discussion of using WD service over so far?
>>
>> Not from my pov. Just a day off ;-)
>>
>>>
>>>
>>> Best Regard,
>>> Hideo Yamauchi.
>>>
>>>
>>> ----- Original Message -----
>>>> From: Klaus Wenninger <kwenning at redhat.com>
>>>> To: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>;
>> users at clusterlabs.org
>>>> Cc:
>>>> Date: 2016/10/7, Fri 17:47
>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the
> DC
>> crmd is frozen, cluster decisions are delayed infinitely
>>>>
>>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>>> Klaus Wenninger <kwenning at redhat.com>
> schrieb am
>>
>>>> 06.10.2016 um 18:03 in
>>>>> Nachricht
> <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>>> On 10/05/2016 04:22 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>>>> If a user uses sbd, can the cluster evade a
>> problem of
>>>> SIGSTOP of crmd?
>>>>>>>>
>>>>>>>> As pointed out earlier, maybe crmd should feed a
>> watchdog. Then
>>>> stopping
>>>>>> crmd
>>>>>>>> will reboot the node (unless the watchdog fails).
>>>>>>> Thank you for comment.
>>>>>>>
>>>>>>> We examine watchdog of crmd, too.
>>>>>>> In addition, I comment after examination advanced.
>>>>>> Was thinking of doing a small test implementation going
>>>>>> a little in the direction Lars Ellenberg had been
> pointing
>> out.
>>>>>>
>>>>>> a couple of thoughts I had so far:
>>>>>>
>>>>>> - add an API (via DBus or libqb - favoring libqb atm) to
> sbd
>>>>>> an application can use to create a watchdog within sbd
>>>>> Why has it to be done within sbd?
>>>> Not necessarily, could be spawned out as well into an own project
> or
>>>> something already existent could be taken.
>>>> Remember to have added a dbus-interface to
>>>> https://sourceforge.net/projects/watchdog/ for a project once.
>>>> If you have a suggestion I'm open.
>>>> Going off sbd would have the advantage of a smooth start:
>>>>
>>>> - cluster/pacemaker-watcher are there already and can
>>>> be replaced/moved over time
>>>> - the lifecycle of the daemon (when started/stopped) is
>>>> already something that is in the code and in the people's
> minds
>>>>
>>>>>> - parameters for the first are a name and a timeout
>>>>>>
>>>>>> - first use-case would be crmd observation
>>>>>>
>>>>>> - later on we could think of removing pacemaker
> dependencies
>>>>>> from sbd by moving the actual implementation of
>>>>>> pacemaker-watcher and probably cluster-watcher as well
>>>>>> into pacemaker - using the new API
>>>>>>
>>>>>> - this of course creates sbd dependency within pacemaker
> so
>>>>>> that it would make sense to offer a simpler and
>> self-contained
>>>>>> implementation within pacemaker as an alternative
>>>>> I think the watchdog interface is so simple that you
> don't
>> need a relay
>>>> for it. The only limit I can imagine is the number of watchdogs
>> available of
>>>> some specific hardware.
>>>> That is the point ;-)
>>>>>> thus it would be favorable to have the dependency
>>>>>> within a non-compulsory pacemaker-rpm so that
>>>>>> we can offer an alternative that doesn't use sbd
>>>>>> at maybe the cost of being less reliable or one
>>>>>> that owns a hardware-watchdog by itself for systems
>>>>>> where this is still unused.
>>>>>>
>>>>>> - e.g. via some kind of plugin (Andrew forgive me -
>>>>>> no
> pils ;-)
>> )
>>>>>> - or via an additional daemon
>>>>>>
>>>>>> What did you have in mind?
>>>>>> Maybe it makes sense to synchronize...
>>>>>>
>>>>>> Regards,
>>>>>> Klaus
>>>>>>
>>>>>>> Best Regards,
>>>>>>> Hideo Yamauchi.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: Ulrich Windl
>> <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>>> To: users at clusterlabs.org;
> renayama19661014 at ybb.ne.jp
>>>>>>>> Cc:
>>>>>>>> Date: 2016/10/5, Wed 23:08
>>>>>>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: When
> the DC
>> crmd is
>>>> frozen,
>>>>>> cluster decisions are delayed infinitely
>>>>>>>>>>> <renayama19661014 at ybb.ne.jp>
>> schrieb am
>>>> 21.09.2016 um 11:52
>>>>>>>> in Nachricht
>>>>>>>>
>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Was the final conclusion given about this
>> problem?
>>>>>>>>>
>>>>>>>>> If a user uses sbd, can the cluster evade a
>> problem of
>>>> SIGSTOP of crmd?
>>>>>>>> As pointed out earlier, maybe crmd should feed a
>> watchdog. Then
>>>> stopping
>>>>>> crmd
>>>>>>>> will reboot the node (unless the watchdog fails).
>>>>>>>>
>>>>>>>>> We are interested in this problem, too.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>>
>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list