<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"Comic Sans MS";
panose-1:3 15 7 2 3 3 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Comic Sans MS";
color:windowtext;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:176848183;
mso-list-type:hybrid;
mso-list-template-ids:-33552840 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="2050" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>Andrew,<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>Thanks for the feedback to my earlier questions from March 6th. I’ve done some further investigation wrt the timing of what I’d call the “simple” failover case: where an SSID that is master on the DC node is killed, and it takes 10-12 seconds before the slave SSID on the other node transitions to master. (Recall that “SSID” is a SliceServer app instance, each of which is abstracted as a Pacemaker resource.)<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>Before going into my findings, I want to clear up a couple of misstatements on my part.<o:p></o:p></span></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]><span style='font-family:"Comic Sans MS"'>WRT my mention of “notifications” in my earlier e-mail, I misused the term. I was simply referring to the “notify” events passed from the DC to the other node.<o:p></o:p></span></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-family:Symbol'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]><span style='font-family:"Comic Sans MS"'>I also misspoke when I said that the failed SSID was subsequently restarted as a result of a monitor event. In fact, the SSID process is restarted by the “ss” resource agent script in response to a “start” event from lrmd.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>The key issue, however, is the time required – 10 to 12 seconds – from the time the master SSID is killed until the slave fails over to become master. You opined that the time required would largely depend upon the behavior of the resource agent, which in our case is a script called “ss”. To determine what effect the ss script’s execution would be, I modified it to log the current monotonic system clock value each time it starts, and just before it exits. The log messages specify the clock value in ms.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>From this, I did find several instances where the ss script would take just over a second to complete execution. In each such case, the “culprit” is an exec of “crm_node –p”, which is called to determine how many nodes are presently in the cluster. (I’ve verified this timing independently by executing “crm_node –p” from a command line when the cluster is quiescent.) This seems like a rather long time for a simple objective. What would “crm_node –p” do that would take so long?<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>That notwithstanding, from the POV of the slave during the failover, there are delays of several hundred to about 1400ms between the completion of the ss script and its invocation for the next event. To explain, I’ve attached an Excel spreadsheet (which I’ve verified is virus-free), that documents two experiments. In each case, there’s an SSID instance that’s master on node-0, the DC, and which is killed. The spreadsheet includes a synopsis of the log message that follows on both cans, interleaved into a timeline.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>By way of explanation, columns B-D contain timestamp information for node-0 and columns E-G for node 1. Columns B/E show the current time of day, C/F show the monotonic clock value when the ss script begins execution (in ms, truncated to the least 5 digits), and D/G show the duration of the ss script execution for the relevant event. Column H is text extracted from the log, showing the key text. In some cases there is a significant amount of information in the log file relating to pengine behavior, but I omitted such information from the spreadsheet. Column I contains explanatory comments.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>Realizing that we need to look forward to upgrading our Pacemaker version (from 1.0.9), I wonder if you can clear up a couple of questions. We are presently using Heartbeat, which I believe restricts our upgrade to the 1.0 branch, correct? In other words, if we want to upgrade to the 1.1 branch, are we required to replace Heartbeat with Corosync? Secondly, when upgrading, are there kernel dependencies to worry about? We are presently running on the open source kernel version 2.6.18. We plan to migrate to the most current 2.8 or 3.0 version later this year, at which time it would probably make sense to bring Pacemaker up to date.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>I apologize for the length of this posting, and again appreciate any assistance you can offer.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'>Regards,<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'> Michael Powell<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Comic Sans MS"'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'><img width=160 height=50 id="Picture_x0020_1" src="cid:image001.gif@01CE1C07.13969EB0" alt=LogoSignature2><o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> Michael Powell<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> Staff Engineer<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> 15220 NW Greenbrier Pkwy<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> Suite 290<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> Beaverton, OR 97006<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> T 503-372-7327 M 503-789-3019 H 503-625-5332<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in'><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#1F497D'> <a href="http://www.harmonicinc.com"><span style='color:blue'>www.harmonicinc.com</span></a><o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>