[Pacemaker] Pacemaker failover delays (followup)

Fri Mar 8 17:50:10 EST 2013

Andrew,

Thanks for the feedback to my earlier questions from March 6th.  I've done some further investigation wrt the timing of what I'd call the "simple" failover case:   where an SSID that is master on the DC node is killed, and it takes 10-12 seconds before the slave SSID on the other node transitions to master.  (Recall that "SSID" is a SliceServer app instance, each of which is abstracted as a Pacemaker resource.)

Before going into my findings, I want to clear up a couple of misstatements on my part.

*         WRT my mention of "notifications" in my earlier e-mail, I misused the term.  I was simply referring to the "notify" events passed from the DC to the other node.

*         I also misspoke when I said that the failed SSID was subsequently restarted as a result of a monitor event.  In fact, the SSID process is restarted by the "ss" resource agent script in response  to a "start" event from lrmd.

The key issue, however, is the time required - 10 to 12 seconds - from the time the master SSID is killed until the slave fails over to become master.  You opined that the time required would largely depend upon the behavior of the resource agent, which in our case is a script called "ss".  To determine what effect the ss script's execution would be, I modified it to log the current monotonic system clock value each time it starts, and just before it exits.  The log messages specify the clock value in ms.

>From this, I did find several instances where the ss script would take just over a second to complete execution.  In each such case, the "culprit" is an exec of "crm_node -p", which is called to determine how many nodes are presently in the cluster.  (I've verified this timing independently by executing "crm_node -p" from a command line when the cluster is quiescent.)  This seems like a rather long time for a simple objective.  What would "crm_node -p" do that would take so long?

That notwithstanding, from the POV of the slave during the failover, there are delays of several hundred to about 1400ms between the completion of the ss script and its invocation for the next event.  To explain, I've attached an Excel spreadsheet (which I've verified is virus-free), that documents two experiments.  In each case, there's an SSID instance that's master on node-0, the DC, and which is killed.  The spreadsheet includes a synopsis of the log message that follows on both cans, interleaved into a timeline.

By way of explanation, columns B-D contain timestamp information for node-0 and columns E-G for node 1.  Columns B/E show the current time of day, C/F show the monotonic clock value when the ss script begins execution (in ms, truncated to the least 5 digits), and D/G show the duration of the ss script execution for the relevant event.  Column H is text extracted from the log, showing the key text.  In some cases there is a significant amount of information in the log file relating to pengine behavior, but I omitted such information from the spreadsheet.  Column I contains explanatory comments.

Realizing that we need to look forward to upgrading our Pacemaker version (from 1.0.9), I wonder if you can clear up a couple of questions.  We are presently using Heartbeat, which I believe restricts our upgrade to the 1.0 branch, correct?  In other words, if we want to upgrade to the 1.1 branch, are we required to replace Heartbeat with Corosync?  Secondly, when upgrading, are there kernel dependencies to worry about?  We are presently running on the open source kernel version 2.6.18.  We plan to migrate to the most current 2.8 or 3.0 version later this year, at which time it would probably make sense to bring Pacemaker up to date.

I apologize for the length of this posting, and again appreciate any assistance you can offer.

Regards,
  Michael Powell

[cid:image001.gif at 01CE1C07.13969EB0]

    Michael Powell
    Staff Engineer

    15220 NW Greenbrier Pkwy
        Suite 290
    Beaverton, OR   97006
    T 503-372-7327    M 503-789-3019   H 503-625-5332

    www.harmonicinc.com<http://www.harmonicinc.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130308/d31c0cf8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1625 bytes
Desc: image001.gif
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130308/d31c0cf8/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 07MarTimeline.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 18841 bytes
Desc: 07MarTimeline.xlsx
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130308/d31c0cf8/attachment-0001.xlsx>