<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Calibri" size="2"><span style="font-size:11pt;">
<div>Hello all,</div>
<div> </div>
<div>We’re in the process of implementing geo-redundancy on SLES 11 SP3 (version 0.1.0). We are seeing behavior in which site 2 in a geo-cluster decides that the ticket has expired long before actual expiry. Here’s an example time-line:</div>
<div> </div>
<div>1 - All sites (site 1, site 2 and arbitrator) agree on ticket owner and expiry. i.e. site 2 has the ticket with a 60-second expiry:</div>
<div>Aug 25 10:07:10 linux-4i31 booth-arbitrator: [22526]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975690' was executed</div>
<div>Aug 25 10:07:10 bb5Btas0 booth-site: [27782]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975690' was executed</div>
<div>Aug 25 10:07:10 bb5Atas1 booth-site: [7826]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975690' was executed</div>
<div> </div>
<div>2 - After 48 seconds (80% into lease), all three nodes are still in agreement:</div>
<div>Site 2: </div>
<div>Aug 25 10:07:58 bb5Btas0 booth-site: [27782]: info: command: 'crm_ticket -t geo-ticket -S owner -v 2' was executed </div>
<div>Aug 25 10:07:58 bb5Btas0 booth-site: [27782]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975738' was executed</div>
<div> </div>
<div>The arbitrator: </div>
<div>Aug 25 10:07:58 linux-4i31 crm_ticket[23836]: notice: crm_log_args: Invoked: crm_ticket -t geo-ticket -S owner -v 2</div>
<div>Aug 25 10:07:58 linux-4i31 booth-arbitrator: [22526]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975738' was executed</div>
<div> </div>
<div>Site 1:</div>
<div>Aug 25 10:07:58 bb5Atas1 booth-site: [7826]: info: command: 'crm_ticket -t geo-ticket -S owner -v 2' was executed</div>
<div>Aug 25 10:07:58 bb5Atas1 booth-site: [7826]: info: command: 'crm_ticket -t geo-ticket -S expires -v 1408975738' was executed</div>
<div> </div>
<div>3 - Site 2 decides that the ticket has expired (at the expiry time set in step 1)</div>
<div>Aug 25 10:08:10 bb5Btas0 booth-site: [27782]: debug: lease expires ...</div>
<div> </div>
<div>4 - At 10:08:58, both site 1 and the arbitrator expire the lease and pick a new master.</div>
<div> </div>
<div>I presume that there was some missed communication between site 2 and the rest of the geo-cluster. There is nothing in the logs to help debug this, though. Any hints on debugging this?</div>
<div> </div>
<div>BTW: we only ever see this on a site 2 – never a site 1. This is consistent across several labs. Is there a bias towards site 1?</div>
<div> </div>
<div>Thanks in advance,</div>
<div> </div>
<div>Rob</div>
<div> </div>
<div> </div>
</span></font>
</body>
</html>