[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Sun Nov 6 04:58:35 UTC 2016
Hi Klaus,
Hi Jan,
Hi All,
About watchdog using WD service, there does not seem to be the opposite opinion.
I do work to make an official patch from next week.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2016/10/26, Wed 17:46
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>
> Hi Klaus,
> Hi Jan,
> Hi All,
>
> Our member argued about watchdog using WD service.
>
> 1) The WD service is not abolished.
> 2) In pacemaker_remote, it is available by starting corosync in localhost.
> 3) It is necessary for the scramble of watchdog to consider it.
> 4) Because I think about the case which does not use sbd, I do not think about
> adding an interface similar to corosync-API to sbd for the moment.
>
> The user chooses a method using method and WD service using sbd and will use it.
> It may cause confusion that there are two methods, but there is value for the
> user who does not use sbd.
>
> We want to include watchdog using WD service in Pacemaker.
> I intend to make an official patch.
>
> What do you think?
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
>> Cc:
>> Date: 2016/10/20, Thu 19:08
>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd
> is frozen, cluster decisions are delayed infinitely
>>
>> Hi Klaus,
>> Hi Jan,
>>
>> Thank you for comment.
>>
>> I wait for other comment a little more.
>> We will argue about this matter next week.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>> ----- Original Message -----
>>> From: Jan Friesse <jfriesse at redhat.com>
>>> To: kwenning at redhat.com; Cluster Labs - All topics related to
> open-source
>> clustering welcomed <users at clusterlabs.org>
>>> Cc:
>>> Date: 2016/10/20, Thu 15:46
>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC
> crmd
>> is frozen, cluster decisions are delayed infinitely
>>>
>>>>
>>>> On 10/14/2016 11:21 AM, renayama19661014 at ybb.ne.jp wrote:
>>>>> Hi Klaus,
>>>>> Hi All,
>>>>>
>>>>> I tried prototype of watchdog using WD service.
>>>>> -
>>>
>>
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
>>>>>
>>>>> Please comment.
>>>> Thank you Hideo for providing the prototype.
>>>> Added the patch to my build and it seems to
>>>> be working as expected.
>>>>
>>>> A few thoughts triggered by this approach:
>>>>
>>>> - we have to alert the corosync-people as in
>>>> a chat with Jan Friesse he pointed me to the
>>>> fact that for corosync 3.x the wd-service was
>>>> planned to be removed
>>>
>>> Actually I didn't express myself correctly. What I wanted to say
> was
>>> "I'm considering idea of removing it", simply because
>> it's
>>> disabled in
>>> downstream.
>>>
>>> BUT keep in mind that removing functionality = ask community to find
> out
>>> if there is not somebody actively using it.
>>>
>>> And because there is active users and future use case, removing of wd
> is
>>> not an option.
>>>
>>>
>>>>
>>>> especially delicate as the binding is very loose
>>>> so that - as is - it builds against a corosync with
>>>> disabled wd-service without any complaints...
>>>>
>>>> - as of now if you enable wd-service in the
>>>> corosync-build it is on by default and would
>>>> be hogging the watchdog presumably
>>>> (there is obviously a pull request that makes
>>>> it default to off)
>>>>
>>>> - with my thoughts about adding an API to
>>>> sbd previously in the thread I was trying to
>>>> target closer observation of pacemaker_remoted
>>>> as well (remote-nodes don't have corosync
>>>> running)
>>>>
>>>> I guess it would be possible to run corosync
>>>> with a static config as single-node cluster
>>>> bound to localhost for that purpose.
>>>>
>>>> I read the thread about corosync-remote and
>>>> that happening might make the special-handling
>>>> for pacemaker-remote obsolete anyway ...
>>>>
>>>> - to enable the approach to live alongside
>>>> sbd it would be possible to make sbd use
>>>> the corosync-API as well for watchdog purposes
>>>> instead of opening the watchdog directly
>>>>
>>>> This shouldn't be a big deal for sbd used to
>>>> observe a pacemaker-node as cluster-watcher
>>>> (the part of sbd that sends cpg-pings to corosync)
>>>> already builds against corosync.
>>>> The blockdevice-part of sbd being basically
>>>> generic it might be an issue though.
>>>>
>>>> Regards,
>>>> Klaus
>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "renayama19661014 at ybb.ne.jp"
>>> <renayama19661014 at ybb.ne.jp>
>>>>>> To: "users at clusterlabs.org"
>> <users at clusterlabs.org>
>>>>>> Cc:
>>>>>> Date: 2016/10/11, Tue 17:58
>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re:
> When
>> the
>>> DC crmd is frozen, cluster decisions are delayed infinitely
>>>>>>
>>>>>> Hi Klaus,
>>>>>>
>>>>>> Thank you for comment.
>>>>>>
>>>>>> I make the patch which is prototype using WD service.
>>>>>>
>>>>>> Please wait a little.
>>>>>>
>>>>>> Best Regards,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: Klaus Wenninger <kwenning at redhat.com>
>>>>>>> To: users at clusterlabs.org
>>>>>>> Cc:
>>>>>>> Date: 2016/10/10, Mon 21:03
>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re:
> Antw: Re:
>> When
>>> the DC crmd
>>>>>> is frozen, cluster decisions are delayed infinitely
>>>>>>> On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp
>> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Our user may not necessarily use sdb.
>>>>>>>>
>>>>>>>> I confirmed that there was a method using WD
>> service of
>>> corosync as
>>>>>> one
>>>>>>> method not to use sdb.
>>>>>>>> Pacemaker watches the process of pacemaker by
> WD
>> service
>>> using CMAP
>>>>>> and can
>>>>>>> carry out watchdog.
>>>>>>>
>>>>>>> Have to have a look at that...
>>>>>>> But if we establish some in-between-layer in
> pacemaker
>> we
>>> could have this
>>>>>>> as one of the possibilities besides e.g. sbd (with
>> enhanced
>>> API), going for
>>>>>>> a watchdog-device directly, ...
>>>>>>>
>>>>>>>>
>>>>>>>> We can set up a patch of pacemaker.
>>>>>>> Always helpful to discuss/clarify an idea once some
> code
>> is
>>> available ...
>>>>>>>
>>>>>>>> Was the discussion of using WD service over so
> far?
>>>>>>> Not from my pov. Just a day off ;-)
>>>>>>>
>>>>>>>>
>>>>>>>> Best Regard,
>>>>>>>> Hideo Yamauchi.
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: Klaus Wenninger
>> <kwenning at redhat.com>
>>>>>>>>> To: Ulrich Windl
>>> <Ulrich.Windl at rz.uni-regensburg.de>;
>>>>>>> users at clusterlabs.org
>>>>>>>>> Cc:
>>>>>>>>> Date: 2016/10/7, Fri 17:47
>>>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw:
> Re:
>> Antw:
>>> Re: When the
>>>>>> DC
>>>>>>> crmd is frozen, cluster decisions are delayed
> infinitely
>>>>>>>>> On 10/07/2016 08:14 AM, Ulrich Windl
> wrote:
>>>>>>>>>>>>> Klaus Wenninger
>>> <kwenning at redhat.com>
>>>>>> schrieb am
>>>>>>>>> 06.10.2016 um 18:03 in
>>>>>>>>>> Nachricht
>>>>>> <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>>>>>>>> On 10/05/2016 04:22 PM,
>>> renayama19661014 at ybb.ne.jp wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>>>> If a user uses sbd,
> can
>> the
>>> cluster evade a
>>>>>>> problem of
>>>>>>>>> SIGSTOP of crmd?
>>>>>>>>>>>>>
>>>>>>>>>>>>> As pointed out earlier,
> maybe
>> crmd
>>> should feed a
>>>>>>> watchdog. Then
>>>>>>>>> stopping
>>>>>>>>>>> crmd
>>>>>>>>>>>>> will reboot the node
> (unless
>> the
>>> watchdog fails).
>>>>>>>>>>>> Thank you for comment.
>>>>>>>>>>>>
>>>>>>>>>>>> We examine watchdog of crmd,
> too.
>>>>>>>>>>>> In addition, I comment after
>>> examination advanced.
>>>>>>>>>>> Was thinking of doing a small
> test
>>> implementation going
>>>>>>>>>>> a little in the direction Lars
>> Ellenberg
>>> had been
>>>>>> pointing
>>>>>>> out.
>>>>>>>>>>> a couple of thoughts I had so
> far:
>>>>>>>>>>>
>>>>>>>>>>> - add an API (via DBus or libqb -
>
>> favoring
>>> libqb atm) to
>>>>>> sbd
>>>>>>>>>>> an application can use to
> create a
>>> watchdog within sbd
>>>>>>>>>> Why has it to be done within sbd?
>>>>>>>>> Not necessarily, could be spawned out as
> well
>> into
>>> an own project
>>>>>> or
>>>>>>>>> something already existent could be taken.
>>>>>>>>> Remember to have added a dbus-interface to
>>>>>>>>> https://sourceforge.net/projects/watchdog/
> for
>> a
>>> project once.
>>>>>>>>> If you have a suggestion I'm open.
>>>>>>>>> Going off sbd would have the advantage of
> a
>> smooth
>>> start:
>>>>>>>>>
>>>>>>>>> - cluster/pacemaker-watcher are there
> already
>> and
>>> can
>>>>>>>>> be replaced/moved over time
>>>>>>>>> - the lifecycle of the daemon (when
>> started/stopped)
>>> is
>>>>>>>>> already something that is in the code
> and in
>> the
>>> people's
>>>>>> minds
>>>>>>>>>>> - parameters for the first are a
> name
>> and a
>>> timeout
>>>>>>>>>>>
>>>>>>>>>>> - first use-case would be crmd
>> observation
>>>>>>>>>>>
>>>>>>>>>>> - later on we could think of
> removing
>>> pacemaker
>>>>>> dependencies
>>>>>>>>>>> from sbd by moving the actual
>>> implementation of
>>>>>>>>>>> pacemaker-watcher and probably
>>> cluster-watcher as well
>>>>>>>>>>> into pacemaker - using the new
> API
>>>>>>>>>>>
>>>>>>>>>>> - this of course creates sbd
>> dependency
>>> within pacemaker
>>>>>> so
>>>>>>>>>>> that it would make sense to
> offer a
>>> simpler and
>>>>>>> self-contained
>>>>>>>>>>> implementation within pacemaker
> as
>> an
>>> alternative
>>>>>>>>>> I think the watchdog interface is so
>> simple
>>> that you
>>>>>> don't
>>>>>>> need a relay
>>>>>>>>> for it. The only limit I can imagine is
> the
>> number
>>> of watchdogs
>>>>>>> available of
>>>>>>>>> some specific hardware.
>>>>>>>>> That is the point ;-)
>>>>>>>>>>> thus it would be favorable to
> have
>> the
>>> dependency
>>>>>>>>>>> within a non-compulsory
>> pacemaker-rpm so
>>> that
>>>>>>>>>>> we can offer an alternative
> that
>>> doesn't use sbd
>>>>>>>>>>> at maybe the cost of being less
>
>> reliable
>>> or one
>>>>>>>>>>> that owns a hardware-watchdog
> by
>> itself
>>> for systems
>>>>>>>>>>> where this is still unused.
>>>>>>>>>>>
>>>>>>>>>>> - e.g. via some kind of plugin
>> (Andrew
>>> forgive me -
>>>>>>>>>>>
>
>>
>>> no
>>>>>> pils ;-)
>>>>>>> )
>>>>>>>>>>> - or via an additional daemon
>>>>>>>>>>>
>>>>>>>>>>> What did you have in mind?
>>>>>>>>>>> Maybe it makes sense to
> synchronize...
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Klaus
>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> From: Ulrich Windl
>>>>>>> <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>>>>>>>> To:
> users at clusterlabs.org;
>>>>>> renayama19661014 at ybb.ne.jp
>>>>>>>>>>>>> Cc:
>>>>>>>>>>>>> Date: 2016/10/5, Wed
> 23:08
>>>>>>>>>>>>> Subject: Antw: Re:
>> [ClusterLabs]
>>> Antw: Re: When
>>>>>> the DC
>>>>>>> crmd is
>>>>>>>>> frozen,
>>>>>>>>>>> cluster decisions are delayed
>> infinitely
>>>>>>>>>>>>>>>>
>>> <renayama19661014 at ybb.ne.jp>
>>>>>>> schrieb am
>>>>>>>>> 21.09.2016 um 11:52
>>>>>>>>>>>>> in Nachricht
>>>>>>>>>>>>>
>>>>>>>
> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Was the final
> conclusion
>> given
>>> about this
>>>>>>> problem?
>>>>>>>>>>>>>> If a user uses sbd,
> can
>> the
>>> cluster evade a
>>>>>>> problem of
>>>>>>>>> SIGSTOP of crmd?
>>>>>>>>>>>>> As pointed out earlier,
> maybe
>> crmd
>>> should feed a
>>>>>>> watchdog. Then
>>>>>>>>> stopping
>>>>>>>>>>> crmd
>>>>>>>>>>>>> will reboot the node
> (unless
>> the
>>> watchdog fails).
>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are interested in
> this
>>
>>> problem, too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>>>>> Users mailing list:
>>> Users at clusterlabs.org
>>>>>>>>>>>>>>
>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>> Project Home:
>>> http://www.clusterlabs.org
>>>>>>>>>>>>>> Getting started:
>>>>>>>>>
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>> Bugs:
>>> http://bugs.clusterlabs.org
>>>>>>>>>>>>
>>> _______________________________________________
>>>>>>>>>>>> Users mailing list:
>>> Users at clusterlabs.org
>>>>>>>>>>>>
>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>
>>>>>>>>>>>> Project Home:
>>> http://www.clusterlabs.org
>>>>>>>>>>>> Getting started:
>>>>>>>>>
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>> Bugs:
> http://bugs.clusterlabs.org
>>>>>>>>>>>
>>> _______________________________________________
>>>>>>>>>>> Users mailing list:
>> Users at clusterlabs.org
>>>>>>>>>>>
>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>> Project Home:
>> http://www.clusterlabs.org
>>>>>>>>>>> Getting started:
>>>>>>>>>
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>
> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>
> _______________________________________________
>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list