[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Sun Nov 6 00:58:35 EDT 2016

Hi Klaus,
Hi Jan,
Hi All,

About watchdog using WD service, there does not seem to be the opposite opinion.
I do work to make an official patch from next week.

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc: 
> Date: 2016/10/26, Wed 17:46
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> Hi Jan,
> Hi All,
> 
> Our member argued about watchdog using WD service.
> 
> 1) The WD service is not abolished.
> 2) In pacemaker_remote, it is available by starting corosync in localhost.
> 3) It is necessary for the scramble of watchdog to consider it.
> 4) Because I think about the case which does not use sbd, I do not think about 
> adding an interface similar to corosync-API to sbd for the moment.
> 
> The user chooses a method using method and WD service using sbd and will use it.
> It may cause confusion that there are two methods, but there is value for the 
> user who does not use sbd.
> 
> We want to include watchdog using WD service in Pacemaker.
> I intend to make an official patch.
> 
> What do you think?
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> ----- Original Message -----
>>  From: "renayama19661014 at ybb.ne.jp" 
> <renayama19661014 at ybb.ne.jp>
>>  To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users at clusterlabs.org>
>>  Cc: 
>>  Date: 2016/10/20, Thu 19:08
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>  Hi Klaus,
>>  Hi Jan,
>> 
>>  Thank you for comment.
>> 
>>  I wait for other comment a little more.
>>  We will argue about this matter next week.
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>>  ----- Original Message -----
>>>   From: Jan Friesse <jfriesse at redhat.com>
>>>   To: kwenning at redhat.com; Cluster Labs - All topics related to 
> open-source 
>>  clustering welcomed <users at clusterlabs.org>
>>>   Cc: 
>>>   Date: 2016/10/20, Thu 15:46
>>>   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC 
> crmd 
>>  is frozen, cluster decisions are delayed infinitely
>>> 
>>>> 
>>>>    On 10/14/2016 11:21 AM, renayama19661014 at ybb.ne.jp wrote:
>>>>>    Hi Klaus,
>>>>>    Hi All,
>>>>> 
>>>>>    I tried prototype of watchdog using WD service.
>>>>>      - 
>>> 
>> 
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
>>>>> 
>>>>>    Please comment.
>>>>    Thank you Hideo for providing the prototype.
>>>>    Added the patch to my build and it seems to
>>>>    be working as expected.
>>>> 
>>>>    A few thoughts triggered by this approach:
>>>> 
>>>>    - we have to alert the corosync-people as in
>>>>       a chat with Jan Friesse he pointed me to the
>>>>       fact that for corosync 3.x the wd-service was
>>>>       planned to be removed
>>> 
>>>   Actually I didn't express myself correctly. What I wanted to say 
> was 
>>>   "I'm considering idea of removing it", simply because 
>>  it's 
>>>   disabled in 
>>>   downstream.
>>> 
>>>   BUT keep in mind that removing functionality = ask community to find 
> out 
>>>   if there is not somebody actively using it.
>>> 
>>>   And because there is active users and future use case, removing of wd 
> is 
>>>   not an option.
>>> 
>>> 
>>>> 
>>>>       especially delicate as the binding is very loose
>>>>       so that - as is - it builds against a corosync with
>>>>       disabled wd-service without any complaints...
>>>> 
>>>>    - as of now if you enable wd-service in the
>>>>       corosync-build it is on by default and would
>>>>       be hogging the watchdog presumably
>>>>       (there is obviously a pull request that makes
>>>>       it default to off)
>>>> 
>>>>    - with my thoughts about adding an API to
>>>>       sbd previously in the thread I was trying to
>>>>       target closer observation of pacemaker_remoted
>>>>       as well (remote-nodes don't have corosync
>>>>       running)
>>>> 
>>>>       I guess it would be possible to run corosync
>>>>       with a static config as single-node cluster
>>>>       bound to localhost for that purpose.
>>>> 
>>>>       I read the thread about corosync-remote and
>>>>       that happening might make the special-handling
>>>>       for pacemaker-remote obsolete anyway ...
>>>> 
>>>>    - to enable the approach to live alongside
>>>>       sbd it would be possible to make sbd use
>>>>       the corosync-API as well for watchdog purposes
>>>>       instead of opening the watchdog directly
>>>> 
>>>>       This shouldn't be a big deal for sbd used to
>>>>       observe a pacemaker-node as cluster-watcher
>>>>       (the part of sbd that sends cpg-pings to corosync)
>>>>       already builds against corosync.
>>>>       The blockdevice-part of sbd being basically
>>>>       generic it might be an issue though.
>>>> 
>>>>    Regards,
>>>>    Klaus
>>>> 
>>>>> 
>>>>> 
>>>>>    Best Regards,
>>>>>    Hideo Yamauchi.
>>>>> 
>>>>> 
>>>>>    ----- Original Message -----
>>>>>>    From: "renayama19661014 at ybb.ne.jp" 
>>>   <renayama19661014 at ybb.ne.jp>
>>>>>>    To: "users at clusterlabs.org" 
>>  <users at clusterlabs.org>
>>>>>>    Cc:
>>>>>>    Date: 2016/10/11, Tue 17:58
>>>>>>    Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: 
> When 
>>  the 
>>>   DC crmd is frozen, cluster decisions are delayed infinitely
>>>>>> 
>>>>>>    Hi Klaus,
>>>>>> 
>>>>>>    Thank you for comment.
>>>>>> 
>>>>>>    I make the patch which is prototype using WD service.
>>>>>> 
>>>>>>    Please wait a little.
>>>>>> 
>>>>>>    Best Regards,
>>>>>>    Hideo Yamauchi.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>    ----- Original Message -----
>>>>>>>      From: Klaus Wenninger <kwenning at redhat.com>
>>>>>>>      To: users at clusterlabs.org
>>>>>>>      Cc:
>>>>>>>      Date: 2016/10/10, Mon 21:03
>>>>>>>      Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: 
> Antw: Re: 
>>  When 
>>>   the DC crmd
>>>>>>    is frozen, cluster decisions are delayed infinitely
>>>>>>>      On 10/07/2016 11:10 PM, renayama19661014 at ybb.ne.jp 
>>  wrote:
>>>>>>>>       Hi All,
>>>>>>>> 
>>>>>>>>       Our user may not necessarily use sdb.
>>>>>>>> 
>>>>>>>>       I confirmed that there was a method using WD 
>>  service of 
>>>   corosync as
>>>>>>    one
>>>>>>>      method not to use sdb.
>>>>>>>>       Pacemaker watches the process of pacemaker by 
> WD 
>>  service 
>>>   using CMAP
>>>>>>    and can
>>>>>>>      carry out watchdog.
>>>>>>> 
>>>>>>>      Have to have a look at that...
>>>>>>>      But if we establish some in-between-layer in 
> pacemaker 
>>  we 
>>>   could have this
>>>>>>>      as one of the possibilities besides e.g. sbd (with 
>>  enhanced 
>>>   API), going for
>>>>>>>      a watchdog-device directly, ...
>>>>>>> 
>>>>>>>> 
>>>>>>>>       We can set up a patch of pacemaker.
>>>>>>>      Always helpful to discuss/clarify an idea once some 
> code 
>>  is 
>>>   available ...
>>>>>>> 
>>>>>>>>       Was the discussion of using WD service over so 
> far?
>>>>>>>      Not from my pov. Just a day off ;-)
>>>>>>> 
>>>>>>>> 
>>>>>>>>       Best Regard,
>>>>>>>>       Hideo Yamauchi.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>       ----- Original Message -----
>>>>>>>>>       From: Klaus Wenninger 
>>  <kwenning at redhat.com>
>>>>>>>>>       To: Ulrich Windl 
>>>   <Ulrich.Windl at rz.uni-regensburg.de>;
>>>>>>>     users at clusterlabs.org
>>>>>>>>>       Cc:
>>>>>>>>>       Date: 2016/10/7, Fri 17:47
>>>>>>>>>       Subject: Re: [ClusterLabs] Antw: Re: Antw: 
> Re: 
>>  Antw: 
>>>   Re: When the
>>>>>>    DC
>>>>>>>      crmd is frozen, cluster decisions are delayed 
> infinitely
>>>>>>>>>       On 10/07/2016 08:14 AM, Ulrich Windl 
> wrote:
>>>>>>>>>>>>>        Klaus Wenninger 
>>>   <kwenning at redhat.com>
>>>>>>    schrieb am
>>>>>>>>>       06.10.2016 um 18:03 in
>>>>>>>>>>        Nachricht
>>>>>>    <3980cfdd-ebd9-1597-f6bd-a1ca808f7688 at redhat.com>:
>>>>>>>>>>>        On 10/05/2016 04:22 PM, 
>>>   renayama19661014 at ybb.ne.jp wrote:
>>>>>>>>>>>>        Hi All,
>>>>>>>>>>>> 
>>>>>>>>>>>>>>        If a user uses sbd, 
> can 
>>  the 
>>>   cluster evade a
>>>>>>>      problem of
>>>>>>>>>       SIGSTOP of crmd?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>        As pointed out earlier, 
> maybe 
>>  crmd 
>>>   should feed a
>>>>>>>      watchdog. Then
>>>>>>>>>       stopping
>>>>>>>>>>>        crmd
>>>>>>>>>>>>>        will reboot the node 
> (unless 
>>  the 
>>>   watchdog fails).
>>>>>>>>>>>>        Thank you for comment.
>>>>>>>>>>>> 
>>>>>>>>>>>>        We examine watchdog of crmd, 
> too.
>>>>>>>>>>>>        In addition, I comment after 
>>>   examination advanced.
>>>>>>>>>>>        Was thinking of doing a small 
> test 
>>>   implementation going
>>>>>>>>>>>        a little in the direction Lars 
>>  Ellenberg 
>>>   had been
>>>>>>    pointing
>>>>>>>      out.
>>>>>>>>>>>        a couple of thoughts I had so 
> far:
>>>>>>>>>>> 
>>>>>>>>>>>        - add an API (via DBus or libqb - 
> 
>>  favoring 
>>>   libqb atm) to
>>>>>>    sbd
>>>>>>>>>>>          an application can use to 
> create a 
>>>   watchdog within sbd
>>>>>>>>>>        Why has it to be done within sbd?
>>>>>>>>>       Not necessarily, could be spawned out as 
> well 
>>  into 
>>>   an own project
>>>>>>    or
>>>>>>>>>       something already existent could be taken.
>>>>>>>>>       Remember to have added a dbus-interface to
>>>>>>>>>       https://sourceforge.net/projects/watchdog/ 
> for 
>>  a 
>>>   project once.
>>>>>>>>>       If you have a suggestion I'm open.
>>>>>>>>>       Going off sbd would have the advantage of 
> a 
>>  smooth 
>>>   start:
>>>>>>>>> 
>>>>>>>>>       - cluster/pacemaker-watcher are there 
> already 
>>  and 
>>>   can
>>>>>>>>>         be replaced/moved over time
>>>>>>>>>       - the lifecycle of the daemon (when 
>>  started/stopped) 
>>>   is
>>>>>>>>>         already something that is in the code 
> and in 
>>  the 
>>>   people's
>>>>>>    minds
>>>>>>>>>>>        - parameters for the first are a 
> name 
>>  and a 
>>>   timeout
>>>>>>>>>>> 
>>>>>>>>>>>        - first use-case would be crmd 
>>  observation
>>>>>>>>>>> 
>>>>>>>>>>>        - later on we could think of 
> removing 
>>>   pacemaker
>>>>>>    dependencies
>>>>>>>>>>>          from sbd by moving the actual 
>>>   implementation of
>>>>>>>>>>>          pacemaker-watcher and probably 
>>>   cluster-watcher as well
>>>>>>>>>>>          into pacemaker - using the new 
> API
>>>>>>>>>>> 
>>>>>>>>>>>        - this of course creates sbd 
>>  dependency 
>>>   within pacemaker
>>>>>>    so
>>>>>>>>>>>          that it would make sense to 
> offer a 
>>>   simpler and
>>>>>>>      self-contained
>>>>>>>>>>>          implementation within pacemaker 
> as 
>>  an 
>>>   alternative
>>>>>>>>>>        I think the watchdog interface is so 
>>  simple 
>>>   that you
>>>>>>    don't
>>>>>>>      need a relay
>>>>>>>>>       for it. The only limit I can imagine is 
> the 
>>  number 
>>>   of watchdogs
>>>>>>>      available of
>>>>>>>>>       some specific hardware.
>>>>>>>>>       That is the point ;-)
>>>>>>>>>>>          thus it would be favorable to 
> have 
>>  the 
>>>   dependency
>>>>>>>>>>>          within a non-compulsory 
>>  pacemaker-rpm so 
>>>   that
>>>>>>>>>>>          we can offer an alternative 
> that 
>>>   doesn't use sbd
>>>>>>>>>>>          at maybe the cost of being less 
> 
>>  reliable 
>>>   or one
>>>>>>>>>>>          that owns a hardware-watchdog 
> by 
>>  itself 
>>>   for systems
>>>>>>>>>>>          where this is still unused.
>>>>>>>>>>> 
>>>>>>>>>>>          - e.g. via some kind of plugin 
>>  (Andrew 
>>>   forgive me -
>>>>>>>>>>>                                         
>       
>>      
>>>           no
>>>>>>    pils ;-)
>>>>>>>      )
>>>>>>>>>>>          - or via an additional daemon
>>>>>>>>>>> 
>>>>>>>>>>>        What did you have in mind?
>>>>>>>>>>>        Maybe it makes sense to 
> synchronize...
>>>>>>>>>>> 
>>>>>>>>>>>        Regards,
>>>>>>>>>>>        Klaus
>>>>>>>>>>> 
>>>>>>>>>>>>        Best Regards,
>>>>>>>>>>>>        Hideo Yamauchi.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>        ----- Original Message -----
>>>>>>>>>>>>>        From: Ulrich Windl
>>>>>>>      <Ulrich.Windl at rz.uni-regensburg.de>
>>>>>>>>>>>>>        To: 
> users at clusterlabs.org;
>>>>>>    renayama19661014 at ybb.ne.jp
>>>>>>>>>>>>>        Cc:
>>>>>>>>>>>>>        Date: 2016/10/5, Wed 
> 23:08
>>>>>>>>>>>>>        Subject: Antw: Re: 
>>  [ClusterLabs] 
>>>   Antw: Re: When
>>>>>>    the DC
>>>>>>>      crmd is
>>>>>>>>>       frozen,
>>>>>>>>>>>        cluster decisions are delayed 
>>  infinitely
>>>>>>>>>>>>>>>>         
>>>   <renayama19661014 at ybb.ne.jp>
>>>>>>>      schrieb am
>>>>>>>>>       21.09.2016 um 11:52
>>>>>>>>>>>>>        in Nachricht
>>>>>>>>>>>>> 
>>>>>>>      
> <876439.61305.qm at web200311.mail.ssk.yahoo.co.jp>:
>>>>>>>>>>>>>>         Hi All,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>         Was the final 
> conclusion 
>>  given 
>>>   about this
>>>>>>>      problem?
>>>>>>>>>>>>>>         If a user uses sbd, 
> can 
>>  the 
>>>   cluster evade a
>>>>>>>      problem of
>>>>>>>>>       SIGSTOP of crmd?
>>>>>>>>>>>>>        As pointed out earlier, 
> maybe 
>>  crmd 
>>>   should feed a
>>>>>>>      watchdog. Then
>>>>>>>>>       stopping
>>>>>>>>>>>        crmd
>>>>>>>>>>>>>        will reboot the node 
> (unless 
>>  the 
>>>   watchdog fails).
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>         We are interested in 
> this 
>> 
>>>   problem, too.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>         Best Regards,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>         Hideo Yamauchi.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>    _______________________________________________
>>>>>>>>>>>>>>         Users mailing list: 
>>>   Users at clusterlabs.org
>>>>>>>>>>>>>>       
>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>>         Project Home: 
>>>   http://www.clusterlabs.org
>>>>>>>>>>>>>>         Getting started:
>>>>>>>>>       
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>         Bugs: 
>>>   http://bugs.clusterlabs.org
>>>>>>>>>>>>       
>>>   _______________________________________________
>>>>>>>>>>>>        Users mailing list: 
>>>   Users at clusterlabs.org
>>>>>>>>>>>>       
>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>> 
>>>>>>>>>>>>        Project Home: 
>>>   http://www.clusterlabs.org
>>>>>>>>>>>>        Getting started:
>>>>>>>>>       
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>        Bugs: 
> http://bugs.clusterlabs.org
>>>>>>>>>>>       
>>>   _______________________________________________
>>>>>>>>>>>        Users mailing list: 
>>  Users at clusterlabs.org
>>>>>>>>>>>       
>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>> 
>>>>>>>>>>>        Project Home: 
>>  http://www.clusterlabs.org
>>>>>>>>>>>        Getting started:
>>>>>>>>>       
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>        Bugs: http://bugs.clusterlabs.org
>>>>>>>>>       
> _______________________________________________
>>>>>>>>>       Users mailing list: Users at clusterlabs.org
>>>>>>>>>       
> http://clusterlabs.org/mailman/listinfo/users
>>>>>>>>> 
>>>>>>>>>       Project Home: http://www.clusterlabs.org
>>>>>>>>>       Getting started:
>>>>>>>     
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>       Bugs: http://bugs.clusterlabs.org
>>>>>>>>> 
>>>>>>>>       
> _______________________________________________
>>>>>>>>       Users mailing list: Users at clusterlabs.org
>>>>>>>>       http://clusterlabs.org/mailman/listinfo/users
>>>>>>>> 
>>>>>>>>       Project Home: http://www.clusterlabs.org
>>>>>>>>       Getting started:
>>>>>>    http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>       Bugs: http://bugs.clusterlabs.org
>>>>>>> 
>>>>>>> 
>>>>>>>      _______________________________________________
>>>>>>>      Users mailing list: Users at clusterlabs.org
>>>>>>>     http://clusterlabs.org/mailman/listinfo/users
>>>>>>> 
>>>>>>>      Project Home: http://www.clusterlabs.org
>>>>>>>      Getting started: 
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>      Bugs: http://bugs.clusterlabs.org
>>>>>>> 
>>>>>>    _______________________________________________
>>>>>>    Users mailing list: Users at clusterlabs.org
>>>>>>    http://clusterlabs.org/mailman/listinfo/users
>>>>>> 
>>>>>>    Project Home: http://www.clusterlabs.org
>>>>>>    Getting started: 
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>>    _______________________________________________
>>>>>    Users mailing list: Users at clusterlabs.org
>>>>>    http://clusterlabs.org/mailman/listinfo/users
>>>>> 
>>>>>    Project Home: http://www.clusterlabs.org
>>>>>    Getting started: 
>>>   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>    Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> 
>>>> 
>>>>    _______________________________________________
>>>>    Users mailing list: Users at clusterlabs.org
>>>>    http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>    Project Home: http://www.clusterlabs.org
>>>>    Getting started: 
>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>    Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>> 
>>>   _______________________________________________
>>>   Users mailing list: Users at clusterlabs.org
>>>   http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>   Project Home: http://www.clusterlabs.org
>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>   Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>