[Pacemaker] Question regarding starting of master/slave resources and ELECTIONs

Fri Apr 15 18:30:00 EDT 2011

Andrew,

Comments at end with <BS>

________________________________
From: Andrew Beekhof <andrew at beekhof.net>
To: Bob Schatz <bschatz at yahoo.com>
Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
Sent: Fri, April 15, 2011 4:28:52 AM
Subject: Re: [Pacemaker] Question regarding starting of master/slave resources 
and ELECTIONs

On Fri, Apr 15, 2011 at 5:58 AM, Bob Schatz <bschatz at yahoo.com> wrote:
> Andrew,
> Thanks for the help
> Comments inline with <BS>
> ________________________________
> From: Andrew Beekhof <andrew at beekhof.net>
> To: Bob Schatz <bschatz at yahoo.com>
> Cc: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Sent: Thu, April 14, 2011 2:14:40 AM
> Subject: Re: [Pacemaker] Question regarding starting of master/slave
> resources and ELECTIONs
>
> On Thu, Apr 14, 2011 at 10:49 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>>> I noticed that 4 of the master/slave resources will start right away but
>>>> the
>>>> 5 master/slave resource seems to take a minute or so and I am only
>>>> running
>>>> with one node.
>>>> Is this expected?
>>>
>>> Probably, if the other 4 take around a minute each to start.
>>> There is an lrmd config variable that controls how much parallelism it
>>> allows (but i forget the name).
>>> <Bob> It's max-children and I set it to 40 for this test to see if it
>>> would
>>> change the behavior.  (/sbin/lrmadmin -p max-children 40)
>>
>> Thats surprising.  I'll have a look at the logs.
>
> Looking at the logs, I see a couple of things:
>
>
> This is very bad:
> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: WARN: get_uuid:
> Could not calculate UUID for mgraid-s000030311-0
> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: WARN:
> populate_cib_nodes_ha: Node mgraid-s000030311-0: no uuid found
>
> For some reason pacemaker cant get the node's uuid from heartbeat.
>
> <BS> I create the uuid when the node comes up.

Heartbeat should have already created it before pacemaker even got
started though.

>
> So we start a few things:
>
> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
> do_lrm_rsc_op: Performing
> key=23:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
> op=SSS000030311:0_start_0 )
> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
> do_lrm_rsc_op: Performing
> key=49:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
> op=SSJ000030312:0_start_0 )
> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
> do_lrm_rsc_op: Performing
> key=75:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
> op=SSJ000030313:0_start_0 )
> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
> do_lrm_rsc_op: Performing
> key=101:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
> op=SSJ000030314:0_start_0 )
>
> But then another change comes in:
>
> Apr 12 19:33:41 mgraid-S000030311-1 crmd: [17529]: info:
> abort_transition_graph: need_abort:59 - Triggered transition abort
> (complete=0) : Non-status change
>
> Normally we'd recompute and keep going, but it was a(nother) replace
> operation, so:
>
> Apr 12 19:33:42 mgraid-S000030311-1 crmd: [17529]: info:
> do_state_transition: State transition S_TRANSITION_ENGINE ->
> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=do_cib_replaced ]
>
> All the time goes here:
>
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Timer popped (timeout=20000,
> abort_level=1000000, complete=true)
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Ignoring timeout while not in transition
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Timer popped (timeout=20000,
> abort_level=1000000, complete=true)
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Ignoring timeout while not in transition
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Timer popped (timeout=20000,
> abort_level=1000000, complete=true)
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Ignoring timeout while not in transition
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Timer popped (timeout=20000,
> abort_level=1000000, complete=true)
> Apr 12 19:35:31 mgraid-S000030311-1 crmd: [17529]: WARN:
> action_timer_callback: Ignoring timeout while not in transition
> Apr 12 19:37:00 mgraid-S000030311-1 crmd: [17529]: ERROR:
> crm_timer_popped: Integration Timer (I_INTEGRATED) just popped!
>
> but its not at all clear to me why - although certainly avoiding the
> election would help.
> Is there any chance to load all the changes at once?
>
> <BS> Yes.  That worked.  I created the configuration in a file and then did
> a "crm configure load update <filename>" to avoid the election
> Possibly the delay related to the UUID issue above, possibly it might
> be related to one of these two patches that went in after 1.0.9
>
> andrew (stable-1.0)    High: crmd: Make sure we always poke the FSA after
> a transition to clear any TE_HALT actions CS: 9187c0506fd3 On:
> 2010-07-07
> andrew (stable-1.0)    High: crmd: Reschedule the PE_START action if its
> not already running when we try to use it CS: e44dfe49e448 On:
> 2010-11-11
>
> Could you try turning on debug and/or a more recent version?
>
> <BS>  I turned on debug and grabbed the logs, configuration and and
> /var/lib/pengine directory.   They are attached.
>      Unfortunately I cannot try a new version with this hardware at this
> time. :(

Based on the new logs, it looks like the problem is resolved by
loading everything once.
All start actions appear to occur within a second of Apr 14 20:35:43.

<BS>  This run did not create the resources all at once.  It did start the 
resources at Apr 14 20:35:43.
However, when looking at the log debug/ctlr0/ha-debug you will see these entries 
(Note this node is not the pengine):

Apr 14 20:35:52 mgraid-S000030311-0 lrmd: [15228]: debug: on_msg_get_state:state 
of rsc mgraid-stonith:0 is LRM_RSC_IDLE
Apr 14 20:35:52 mgraid-S000030311-0 lrmd: [15228]: debug: on_msg_get_state:state 
of rsc SSJ000030313:0 is LRM_RSC_BUSY
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: WARN: msg_to_op(1324): failed 
to get the value of field lrm_opstatus from a ha_msg
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: msg_to_op: Message 
follows:
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG: Dumping message 
with 16 fields
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[0] : [lrm_t=op]
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[1] : 
[lrm_rid=SSJ000030313:0]
Apr 14 20:35:52 mgraid-S000030311-0 crmd: [15231]: info: MSG[2] : [lrm_op=start]

This was reported as bug report 2580.

Later, this resource was stopped for some reason by pengine.

Unfortunately, I grabbed the logs before the resources such as SSJ000030313:0 
were stopped.  :( :(

Anyway, I was able to get things to work correctly as summarized below.

I think that 2580 covers all of the issues to get the problem fixed.  Do you 
agree?

In the mean time, a summary of the work around was that I did these steps:

1. Increase the number of threads lrmd has to fork off the shell scripts to do 
start, monitor, etc with this command in /etc/ha.d/resource.d/startstop 
(post-start action)

       #  /sbin/lrmadmin -p max-children 8

2. Create all of the resources needed in one file (basically take the output 
from a "crm configure show" on an already configured system and copy to a file) 
as opposed to creating each resource and associated linkage with individual crm 
commands.  This avoids the "replace" step which may happen in crm.  The 
"replace" operation causes the crmds to do an election which causes me problems. 
 (I assume it is a bug related to 2580 but I am not sure)

3. Do an "update" as opposed to a "replace" of the HA configuration with this 
step:

      # crm configure load update <filename>

Thanks Andrew for your help!

Bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110415/98bceef4/attachment-0003.html>