[Pacemaker] "crm resource restart" does not work on the DC node with crmd-transtion-delay="2s"

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Aug 17 04:37:41 EDT 2011


Hi Kazutomo-san,

On Tue, Aug 02, 2011 at 06:12:12PM +0900, NAKAHIRA Kazutomo wrote:
> Hi, Andrew
> 
> (2011/08/01 12:13), Andrew Beekhof wrote:
> >2011/7/27 NAKAHIRA Kazutomo<nakahira.kazutomo at oss.ntt.co.jp>:
> >>Hi, all
> >>
> >>I configured crmd-transition-delay="2s" to address the following problem.
> >>
> >>  http://www.gossamer-threads.com/lists/linuxha/pacemaker/68504
> >>  http://developerbugs.linux-foundation.org/show_bug.cgi?id=2528
> >>
> >>And then, "crm resource restart" command get become less able to
> >>restart any resources on the DC node.
> >># "crm resource restart" works fine on the non-DC node.
> >># Please see attached hb_report generated on the simple environment.
> >>
> >>How can I use "crm resource restart" command on the DC node
> >>with crmd-transtion-delay="2s"?
> >
> >Sounds like the shell isn't waiting long enough.
> 
> I understood that it is hard to resolve this problem by the
> configuration and we need to fix crm shell. Would that be about
> right?
> 
> If so, I made a patch for crm shell that wait a
> crmd-transition-delay before checking DC node status.
> 
> Please see attached patch.

The patches look fine to me.

Cheers,

Dejan

> Best regards,
> 
> >
> >>
> >>I confirmed that I can avoid this problem by the following procedure
> >>  1. "crm resource stop rsc-ID"
> >>  2. wait crmd-transtion-delay(2) scond
> >>  3. "crm resource start rsc-ID"
> >>but this behavior(restart does not works on the DC node)
> >>may be confuse users.
> >>
> >>Best regards,
> >>
> >>--
> >>NAKAHIRA Kazutomo
> >>Infrastructure Software Technology Unit
> >>NTT Open Source Software Center
> >>
> >>_______________________________________________
> >>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >>Project Home: http://www.clusterlabs.org
> >>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>
> >>
> >
> >_______________________________________________
> >Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> 
> 
> -- 
> NAKAHIRA Kazutomo
> Infrastructure Software Technology Unit
> NTT Open Source Software Center

> # HG changeset patch
> # User NAKAHIRA Kazutomo <nakahira.kazutomo at oss.ntt.co.jp>
> # Date 1312274729 -32400
> # Branch stable-1.0
> # Node ID 2b4a64c1bb737cfce61b1eaef0dca31d903d9b2e
> # Parent  db98485d06ed3fe0fe236509f023e1bd4a5566f1
> shell: crm_msec is deemed desirable to be located in the utils.py
> 
> diff -r db98485d06ed -r 2b4a64c1bb73 shell/modules/ra.py.in
> --- a/shell/modules/ra.py.in	Fri May 06 13:47:43 2011 +0200
> +++ b/shell/modules/ra.py.in	Tue Aug 02 17:45:29 2011 +0900
> @@ -224,37 +224,6 @@
>      depth = find_value(pl, "depth") or '0'
>      role = find_value(pl, "role")
>      return mk_monitor_name(role,depth)
> -def crm_msec(t):
> -    '''
> -    See lib/common/utils.c:crm_get_msec().
> -    '''
> -    convtab = {
> -        'ms': (1,1),
> -        'msec': (1,1),
> -        'us': (1,1000),
> -        'usec': (1,1000),
> -        '': (1000,1),
> -        's': (1000,1),
> -        'sec': (1000,1),
> -        'm': (60*1000,1),
> -        'min': (60*1000,1),
> -        'h': (60*60*1000,1),
> -        'hr': (60*60*1000,1),
> -    }
> -    if not t:
> -        return -1
> -    r = re.match("\s*(\d+)\s*([a-zA-Z]+)?", t)
> -    if not r:
> -        return -1
> -    if not r.group(2):
> -        q = ''
> -    else:
> -        q = r.group(2).lower()
> -    try:
> -        mult,div = convtab[q]
> -    except:
> -        return -1
> -    return (int(r.group(1))*mult)/div
>  def crm_time_cmp(a, b):
>      return crm_msec(a) - crm_msec(b)
>  
> diff -r db98485d06ed -r 2b4a64c1bb73 shell/modules/utils.py
> --- a/shell/modules/utils.py	Fri May 06 13:47:43 2011 +0200
> +++ b/shell/modules/utils.py	Tue Aug 02 17:45:29 2011 +0900
> @@ -199,6 +199,38 @@
>      s = get_stdout(add_sudo(cmd), stderr_on)
>      return s.split('\n')
>  
> +def crm_msec(t):
> +    '''
> +    See lib/common/utils.c:crm_get_msec().
> +    '''
> +    convtab = {
> +        'ms': (1,1),
> +        'msec': (1,1),
> +        'us': (1,1000),
> +        'usec': (1,1000),
> +        '': (1000,1),
> +        's': (1000,1),
> +        'sec': (1000,1),
> +        'm': (60*1000,1),
> +        'min': (60*1000,1),
> +        'h': (60*60*1000,1),
> +        'hr': (60*60*1000,1),
> +    }
> +    if not t:
> +        return -1
> +    r = re.match("\s*(\d+)\s*([a-zA-Z]+)?", t)
> +    if not r:
> +        return -1
> +    if not r.group(2):
> +        q = ''
> +    else:
> +        q = r.group(2).lower()
> +    try:
> +        mult,div = convtab[q]
> +    except:
> +        return -1
> +    return (int(r.group(1))*mult)/div
> +
>  def wait4dc(what = "", show_progress = True):
>      '''
>      Wait for the DC to get into the S_IDLE state. This should be

> # HG changeset patch
> # User NAKAHIRA Kazutomo <nakahira.kazutomo at oss.ntt.co.jp>
> # Date 1312275597 -32400
> # Branch stable-1.0
> # Node ID 422551903526667c380e39f1712b38f3c2b8f0a6
> # Parent  2b4a64c1bb737cfce61b1eaef0dca31d903d9b2e
> shell: waits crmd-transition-delay before DC status checking
> 
> diff -r 2b4a64c1bb73 -r 422551903526 shell/modules/utils.py
> --- a/shell/modules/utils.py	Tue Aug 02 17:45:29 2011 +0900
> +++ b/shell/modules/utils.py	Tue Aug 02 17:59:57 2011 +0900
> @@ -263,6 +263,13 @@
>      if not dc:
>          common_warn("can't find DC in: %s" % s)
>          return False
> +    cmd = "crm_attribute -Gq -t crm_config -n crmd-transition-delay 2> /dev/null"
> +    delay = get_stdout(add_sudo(cmd))
> +    if delay:
> +        delaymsec = crm_msec(delay)
> +        if 0 < delaymsec:
> +            common_info("The crmd-transition-delay is configured. Waiting %d msec before check DC status." % delaymsec)
> +            time.sleep(delaymsec / 1000)
>      cmd = "crmadmin -S %s" % dc
>      cnt = 0
>      output_started = 0

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list