[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu Oct 20 12:08:35 CEST 2016
Hi Klaus,
Hi Jan,
Thank you for comment.
I wait for other comment a little more.
We will argue about this matter next week.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: Jan Friesse <jfriesse at redhat.com>
> To: kwenning at redhat.com; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2016/10/20, Thu 15:46
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>
>>
>> On 10/14/2016 11:21 AM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi Klaus,
>>> Hi All,
>>>
>>> I tried prototype of watchdog using WD service.
>>> -
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
>>>
>>> Please comment.
>> Thank you Hideo for providing the prototype.
>> Added the patch to my build and it seems to
>> be working as expected.
>>
>> A few thoughts triggered by this approach:
>>
>> - we have to alert the corosync-people as in
>> a chat with Jan Friesse he pointed me to the
>> fact that for corosync 3.x the wd-service was
>> planned to be removed
>
> Actually I didn't express myself correctly. What I wanted to say was
> "I'm considering idea of removing it", simply because it's
> disabled in
> downstream.
>
> BUT keep in mind that removing functionality = ask community to find out
> if there is not somebody actively using it.
>
> And because there is active users and future use case, removing of wd is
> not an option.
>
>
>>
>> especially delicate as the binding is very loose
>> so that - as is - it builds against a corosync with
>> disabled wd-service without any complaints...
>>
>> - as of now if you enable wd-service in the
>> corosync-build it is on by default and would
>> be hogging the watchdog presumably
>> (there is obviously a pull request that makes
>> it default to off)
>>
>> - with my thoughts about adding an API to
>> sbd previously in the thread I was trying to
>> target closer observation of pacemaker_remoted
>> as well (remote-nodes don't have corosync
>> running)
>>
>> I guess it would be possible to run corosync
>> with a static config as single-node cluster
>> bound to localhost for that purpose.
>>
>> I read the thread about corosync-remote and
>> that happening might make the special-handling
>> for pacemaker-remote obsolete anyway ...
>>
>> - to enable the approach to live alongside
>> sbd it would be possible to make sbd use
>> the corosync-API as well for watchdog purposes
>> instead of opening the watchdog directly
>>
>> This shouldn't be a big deal for sbd used to
>> observe a pacemaker-node as cluster-watcher
>> (the part of sbd that sends cpg-pings to corosync)
>> already builds against corosync.
>> The blockdevice-part of sbd being basically
>> generic it might be an issue though.
>>
>> Regards,
>> Klaus
>>
>>>
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>> ----- Original Message -----
>>>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>>>> To: "users at clusterlabs.org" <users at clusterlabs.org>
>>>> Cc:
>>>> Date: 2016/10/11, Tue 17:58
>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the
> DC crmd is frozen, cluster decisions are delayed infinitely
>>>>
>>>> Hi Klaus,
>>>>
>>>> Thank you for comment.
>>>>
>>>> I make the patch which is prototype using WD service.
>>>>
>>>> Please wait a little.
>>>>
>>>> Best Regards,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: Klaus Wenninger <kwenning at redhat.com>
>>>>> To: users at clusterlabs.org
>>>>> Cc:
>>>>> Date: 2016/10/10, Mon 21:03
>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When
> the DC crmd
>>>> is frozen, cluster decisions are delayed infinitely
>>>>> On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Our user may not necessarily use sdb.
>>>>>>
>>>>>> I confirmed that there was a method using WD service of
> corosync as
>>>> one
>>>>> method not to use sdb.
>>>>>> Pacemaker watches the process of pacemaker by WD service
> using CMAP
>>>> and can
>>>>> carry out watchdog.
>>>>>
>>>>> Have to have a look at that...
>>>>> But if we establish some in-between-layer in pacemaker we
> could have this
>>>>> as one of the possibilities besides e.g. sbd (with enhanced
> API), going for
>>>>> a watchdog-device directly, ...
>>>>>
>>>>>>
>>>>>> We can set up a patch of pacemaker.
>>>>> Always helpful to discuss/clarify an idea once some code is
> available ...
>>>>>
>>>>>> Was the discussion of using WD service over so far?
>>>>> Not from my pov. Just a day off ;-)
>>>>>
>>>>>>
>>>>>> Best Regard,
>>>>>> Hideo Yamauchi.
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: Klaus Wenninger <kwenning at redhat.com>
>>>>>>> To: Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de>;
>>>>> users at clusterlabs.org
>>>>>>> Cc:
>>>>>>> Date: 2016/10/7, Fri 17:47
>>>>>>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw:
> Re: When the
>>>> DC
>>>>> crmd is frozen, cluster decisions are delayed infinitely
>>>>>>> On 10/07/2016 08:14 AM, Ulrich Windl wrote:
>>>>>>>>>>> Klaus Wenninger
> <kwenning at redhat.com>
>>>> schrieb am
>>>>>>> 06.10.2016 um 18:03 in
>>>>>>>> Nachricht
>>>> <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>>>>>> On 10/05/2016 04:22 PM,
> renayama19661014 at ybb.ne.jp wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>>>> If a user uses sbd, can the
> cluster evade a
>>>>> problem of
>>>>>>> SIGSTOP of crmd?
>>>>>>>>>>>
>>>>>>>>>>> As pointed out earlier, maybe crmd
> should feed a
>>>>> watchdog. Then
>>>>>>> stopping
>>>>>>>>> crmd
>>>>>>>>>>> will reboot the node (unless the
> watchdog fails).
>>>>>>>>>> Thank you for comment.
>>>>>>>>>>
>>>>>>>>>> We examine watchdog of crmd, too.
>>>>>>>>>> In addition, I comment after
> examination advanced.
>>>>>>>>> Was thinking of doing a small test
> implementation going
>>>>>>>>> a little in the direction Lars Ellenberg
> had been
>>>> pointing
>>>>> out.
>>>>>>>>> a couple of thoughts I had so far:
>>>>>>>>>
>>>>>>>>> - add an API (via DBus or libqb - favoring
> libqb atm) to
>>>> sbd
>>>>>>>>> an application can use to create a
> watchdog within sbd
>>>>>>>> Why has it to be done within sbd?
>>>>>>> Not necessarily, could be spawned out as well into
> an own project
>>>> or
>>>>>>> something already existent could be taken.
>>>>>>> Remember to have added a dbus-interface to
>>>>>>> https://sourceforge.net/projects/watchdog/ for a
> project once.
>>>>>>> If you have a suggestion I'm open.
>>>>>>> Going off sbd would have the advantage of a smooth
> start:
>>>>>>>
>>>>>>> - cluster/pacemaker-watcher are there already and
> can
>>>>>>> be replaced/moved over time
>>>>>>> - the lifecycle of the daemon (when started/stopped)
> is
>>>>>>> already something that is in the code and in the
> people's
>>>> minds
>>>>>>>>> - parameters for the first are a name and a
> timeout
>>>>>>>>>
>>>>>>>>> - first use-case would be crmd observation
>>>>>>>>>
>>>>>>>>> - later on we could think of removing
> pacemaker
>>>> dependencies
>>>>>>>>> from sbd by moving the actual
> implementation of
>>>>>>>>> pacemaker-watcher and probably
> cluster-watcher as well
>>>>>>>>> into pacemaker - using the new API
>>>>>>>>>
>>>>>>>>> - this of course creates sbd dependency
> within pacemaker
>>>> so
>>>>>>>>> that it would make sense to offer a
> simpler and
>>>>> self-contained
>>>>>>>>> implementation within pacemaker as an
> alternative
>>>>>>>> I think the watchdog interface is so simple
> that you
>>>> don't
>>>>> need a relay
>>>>>>> for it. The only limit I can imagine is the number
> of watchdogs
>>>>> available of
>>>>>>> some specific hardware.
>>>>>>> That is the point ;-)
>>>>>>>>> thus it would be favorable to have the
> dependency
>>>>>>>>> within a non-compulsory pacemaker-rpm so
> that
>>>>>>>>> we can offer an alternative that
> doesn't use sbd
>>>>>>>>> at maybe the cost of being less reliable
> or one
>>>>>>>>> that owns a hardware-watchdog by itself
> for systems
>>>>>>>>> where this is still unused.
>>>>>>>>>
>>>>>>>>> - e.g. via some kind of plugin (Andrew
> forgive me -
>>>>>>>>>
> no
>>>> pils ;-)
>>>>> )
>>>>>>>>> - or via an additional daemon
>>>>>>>>>
>>>>>>>>> What did you have in mind?
>>>>>>>>> Maybe it makes sense to synchronize...
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Klaus
>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: Ulrich Windl
>>>>> <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>>>>>> To: users at clusterlabs.org;
>>>> renayama19661014 at ybb.ne.jp
>>>>>>>>>>> Cc:
>>>>>>>>>>> Date: 2016/10/5, Wed 23:08
>>>>>>>>>>> Subject: Antw: Re: [ClusterLabs]
> Antw: Re: When
>>>> the DC
>>>>> crmd is
>>>>>>> frozen,
>>>>>>>>> cluster decisions are delayed infinitely
>>>>>>>>>>>>>>
> <renayama19661014 at ybb.ne.jp>
>>>>> schrieb am
>>>>>>> 21.09.2016 um 11:52
>>>>>>>>>>> in Nachricht
>>>>>>>>>>>
>>>>> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> Was the final conclusion given
> about this
>>>>> problem?
>>>>>>>>>>>> If a user uses sbd, can the
> cluster evade a
>>>>> problem of
>>>>>>> SIGSTOP of crmd?
>>>>>>>>>>> As pointed out earlier, maybe crmd
> should feed a
>>>>> watchdog. Then
>>>>>>> stopping
>>>>>>>>> crmd
>>>>>>>>>>> will reboot the node (unless the
> watchdog fails).
>>>>>>>>>>>
>>>>>>>>>>>> We are interested in this
> problem, too.
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>> _______________________________________________
>>>>>>>>>>>> Users mailing list:
> Users at clusterlabs.org
>>>>>>>>>>>>
> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>> Project Home:
> http://www.clusterlabs.org
>>>>>>>>>>>> Getting started:
>>>>>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>> Bugs:
> http://bugs.clusterlabs.org
>>>>>>>>>>
> _______________________________________________
>>>>>>>>>> Users mailing list:
> Users at clusterlabs.org
>>>>>>>>>>
> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>> Project Home:
> http://www.clusterlabs.org
>>>>>>>>>> Getting started:
>>>>>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>
> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list