<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">

<div>I'm reading the additions that you added to the pgsql resource agent to allow for streaming replication in Postgres 9.1+.  I'm trying to determine if your resource agent will compensate if node promoted ( new master ) does not have the newest data.  </div>

<div><br>

</div>

<div>From the looks of the pgsql_pre_promote function it seems that it will just fail other replicas (slaves) that have newer data, but will continue with the promotion of the new master even though it does not have the latest data. </div>

<div><br>

</div>

<div>If this is correct is there a way to force the promotion of the node with the newest data?</div>

<div><br>

</div>

<div>v/r</div>

<div><br>

</div>

<div>STEVE</div>

<div><br>

</div>

<div><br>

</div>

<div>

<div>On Mar 26, 2013, at 8:19 AM, Steven Bambling <<a href="mailto:smbambling@arin.net">smbambling@arin.net</a>> wrote:</div>

<br class="Apple-interchange-newline">

<blockquote type="cite">

<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">

Excellent thanks so much for the clarification.  I'll drop this new RA in and see if I can get things working.

<div><br>

</div>

<div>STEVE</div>

<div><br>

</div>

<div><br>

<div>

<div>On Mar 26, 2013, at 7:38 AM, Rainer Brestan <<a href="mailto:rainer.brestan@gmx.net">rainer.brestan@gmx.net</a>></div>

<div> wrote:</div>

<br class="Apple-interchange-newline">

<blockquote type="cite">

<div>

<div style="font-family: Verdana;font-size: 12.0px;"> 

<div>

<div>Hi Steve,</div>

<div>pgsql RA does the same, it compares the last_xlog_replay_location of all nodes for master promotion.</div>

<div>Doing a promote as a restart instead of promote command to conserve timeline id is also on configurable option (restart_on_promote<span class="nv">)

</span>of the current RA.</div>

<div>And the RA is definitely capable of having more than two instances. It goes through the parameter node_list and doing its actions for every member in the node list.</div>

<div>Originally it might be planned only to have only one slave, but the current implementation does not have this limitation. It has code for sync replication of more than two nodes, when some of them fall back into async to not promote them.</div>

<div> </div>

<div>Of course, i will share the extension with the community, when they are ready for use. And the feature of having more than two instances is not removed. I am not running more than two instances on one site, current usage is to have two instances on one

 site and having two sites and manage master by booth. But it also under discussion to have more than two instances on one site, just to have no availability interruption in case of one server down and the other promote with restart.</div>

<div>The implementation is nearly finished, then begins the stress tests of failure scenarios.</div>

<div> </div>

<div>Rainer</div>

<div name="quote" style="margin: 10px 5px 5px 10px; padding: 10px 0px 10px 10px; border-left-width: 2px; border-left-style: solid; border-left-color: rgb(195, 217, 229); word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; position: static; z-index: auto; ">

<div style="margin:0 0 10px 0;"><b>Gesendet:</b> Dienstag, 26. März 2013 um 11:55 Uhr<br>

<b>Von:</b> "Steven Bambling" <<a href="mailto:smbambling@arin.net">smbambling@arin.net</a>><br>

<b>An:</b> "The Pacemaker cluster resource manager" <<a href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a>><br>

<b>Betreff:</b> Re: [Pacemaker] OCF Resource agent promote question</div>

<div name="quoted-content">

<div> 

<div>

<div>On Mar 26, 2013, at 6:32 AM, Rainer Brestan <<a href="x-msg://211/rainer.brestan@gmx.net" target="_parent">rainer.brestan@gmx.net</a>> wrote:</div>

<blockquote>

<div>

<div style="font-family: Verdana;font-size: 12.0px;"> 

<div>

<div>Hi Steve,</div>

<div>when Pacemaker does promotion, it has already selected a specific node to become master.</div>

<div>It is far too late in this state to try to update master scores.</div>

<div> </div>

<div>But there is another problem with xlog in PostgreSQL.</div>

<div> </div>

<div>According to some discussion on PostgreSQL mailing lists, not relevant xlog entries dont go into the xlog counter during redo and/or start. This is specially true for CHECKPOINT xlog records, where this situation can be easely reproduced.</div>

<div>This can lead to the situation, where the replication is up to date, but the slave shows an lower xlog value.</div>

<div>This issue was solved in 9.2.3, where wal receiver always counts the end of applied records.</div>

</div>

</div>

</div>

</blockquote>

<div> </div>

We are currently testing with 9.2.3.  I'm using the functions <a href="http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html" target="_blank">http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html</a> along with tweaking

 a function to get the replay_lag in bytes to have a more accurate measurement.

<blockquote>

<div>

<div style="font-family: Verdana;font-size: 12.0px;">

<div>

<div> </div>

<div>There is also a second boring issue. The timeline change is replicated to the slaves, but they do not save it anywhere. In case slave starts up again and do not have access to the WAL archive, it cannot start any more. This was also addressed as patch

 in 9.2 branch, but i havent test if also fixed in 9.2.3.</div>

</div>

</div>

</div>

</blockquote>

<br>

After talking with one of the Postgres guys it was recommended that we look at an alternative solution to the built in trigger file that will make the master jump to a new timeline.  We are in place moving the recovery.conf to recovery.done via the resource

 agent and then restarting the the postgresql service on the "new" master so that it maintains the original timeline that the slaves will recognize.   

<blockquote>

<div>

<div style="font-family: Verdana;font-size: 12.0px;">

<div>

<div> </div>

<div>For data replication, no matter if PostgreSQL or any other database, you have always two choices of work.</div>

<div>- Data consistency is the top most priority. Dont go in operation, unless everything fine.</div>

<div>- Availability is the top most priority. Always try to have at least one running instance, even if data might not be latest.</div>

<div> </div>

<div>The current pgsql RA does quite a good job for the first choice.</div>

<div> </div>

<div>It currently has some limitations.</div>

<div>- After switchover, no matter of manual/automatic, it needs some work from maintenance personnel.</div>

<div>- Some failure scenarios of fault series lead to a non existing master without manual work.</div>

<div>- Geo-redundant replication with multi-site cluster ticket system (booth) does not work.</div>

<div>- If availability or unattended work is the priority, it cannot be used out of the box.</div>

<div> </div>

<div>But it has a very good structure to be extended for other needs.</div>

<div> </div>

<div>And this is what i currently implement.</div>

<div>Extend the RA to support both choices of work and prepare it for a multi-site cluster ticket system.</div>

</div>

</div>

</div>

</blockquote>

<div> </div>

Would you be willing to share your extended RA?  Also do you run a cluster with more then 2 nodes ?</div>

<div> </div>

<div>v/r</div>

<div> </div>

<div>STEVE</div>

<div> </div>

<div> 

<blockquote>

<div>

<div style="font-family: Verdana;font-size: 12.0px;">

<div>

<div> </div>

<div>Regards, Rainer</div>

<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left: 2.0px solid rgb(195,217,229);">

<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b> Dienstag, 26. März 2013 um 00:01 Uhr<br>

<b>Von:</b> "Andreas Kurz" <<a href="x-msg://211/andreas@hastexo.com" target="_parent">andreas@hastexo.com</a>><br>

<b>An:</b> <a href="x-msg://211/pacemaker@oss.clusterlabs.org" target="_parent">pacemaker@oss.clusterlabs.org</a><br>

<b>Betreff:</b> Re: [Pacemaker] OCF Resource agent promote question</div>

<div>Hi Steve,<br>

<br>

On 2013-03-25 18:44, Steven Bambling wrote:<br>

> All,<br>

><br>

> I'm trying to work on a OCF resource agent that uses postgresql<br>

> streaming replication. I'm running into a few issues that I hope might<br>

> be answered or at least some pointers given to steer me in the right<br>

> direction.<br>

<br>

Why are you not using the existing pgsql RA? It is capable of doing<br>

synchronous and asynchronous replication and it is known to work fine.<br>

<br>

Best regards,<br>

Andreas<br>

<br>

--<br>

Need help with Pacemaker?<br>

<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>

<br>

<br>

><br>

> 1. A quick way of obtaining a list of "Online" nodes in the cluster<br>

> that a resource will be able to migrate to. I've accomplished it with<br>

> some grep and see but its not pretty or fast.<br>

><br>

> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'<br>

> <a href="http://p1.example.net/" target="_blank">p1.example.net</a> <<a href="http://p1.example.net/" target="_blank">http://p1.example.net</a>>

<a href="http://p2.example.net/" target="_blank">p2.example.net</a><br>

> <<a href="http://p2.example.net/" target="_blank">http://p2.example.net</a>><br>

><br>

> real0m2.797s<br>

> user0m0.084s<br>

> sys0m0.024s<br>

><br>

> Once I get a list of active/online nodes in the cluster my thinking was<br>

> to use PSQL to get the current xlog location and lag or each of the<br>

> remaining nodes and compare them. If the node has a greater log<br>

> position and/or less lag it will be given a greater master preference.<br>

><br>

> 2. How to force a monitor/probe before a promote is run on ALL nodes to<br>

> make sure that the master preference is up to date before<br>

> migrating/failing over the resource.<br>

> - I was thinking that maybe during the promote call it could get the log<br>

> location and lag from each of the nodes via an psql call ( like above)<br>

> and then force the resource to a specific node. Is there a way to do<br>

> this and does it sound like a sane idea ?<br>

><br>

><br>

> The start of my RA is located here suggestions and comments 100%<br>

> welcome <a href="https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr" target="_blank">

https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr</a><br>

><br>

> v/r<br>

><br>

> STEVE<br>

><br>

><br>

> _______________________________________________<br>

> Pacemaker mailing list: <a href="x-msg://211/Pacemaker@oss.clusterlabs.org" target="_parent">

Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">

http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

><br>

> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

> Bugs: <a href="http://bugs.clusterlabs.org/" target="_blank">http://bugs.clusterlabs.org</a><br>

><br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="x-msg://211/Pacemaker@oss.clusterlabs.org" target="_parent">

Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org/" target="_blank">http://bugs.clusterlabs.org</a></div>

</div>

</div>

</div>

</div>

_______________________________________________<br>

Pacemaker mailing list: <a href="x-msg://211/Pacemaker@oss.clusterlabs.org" target="_parent">

Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org/">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org/">http://bugs.clusterlabs.org</a></blockquote>

</div>

</div>

</div>

</div>

</div>

</div>

</div>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a><br>

</blockquote>

</div>

<br>

</div>

</div>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: http://www.clusterlabs.org<br>

Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>

Bugs: http://bugs.clusterlabs.org<br>

</blockquote>

</div>

<br>

</body>

</html>