<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">On Oct 14, 2016, at 1:39 AM, Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com" class="">jgdr@dalibo.com</a>> wrote:<br class=""><div><blockquote type="cite" class=""><br class="Apple-interchange-newline"><div class=""><div class="">On Thu, 13 Oct 2016 14:11:06 -0800<br class="">Israel Brewster <<a href="mailto:israel@ravnalaska.net" class="">israel@ravnalaska.net</a>> wrote:<br class=""><br class=""><blockquote type="cite" class="">On Oct 13, 2016, at 1:56 PM, Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com" class="">jgdr@dalibo.com</a>><br class="">wrote:<br class=""><blockquote type="cite" class=""><br class="">On Thu, 13 Oct 2016 10:05:33 -0800<br class="">Israel Brewster <<a href="mailto:israel@ravnalaska.net" class="">israel@ravnalaska.net</a>> wrote:<br class=""><br class=""><blockquote type="cite" class="">On Oct 13, 2016, at 9:41 AM, Ken Gaillot <<a href="mailto:kgaillot@redhat.com" class="">kgaillot@redhat.com</a>> wrote: <br class=""><blockquote type="cite" class=""><br class="">On 10/13/2016 12:04 PM, Israel Brewster wrote: <br class=""></blockquote></blockquote>[...]<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">But whatever- this is a cluster, it doesn't really matter which node<br class="">things are running on, as long as they are running. So the cluster is<br class="">working - postgresql starts, the master process is on the same node as<br class="">the IP, you can connect, etc, everything looks good. Obviously the next<br class="">thing to try is failover - should the master node fail, the slave node<br class="">should be promoted to master. So I try testing this by shutting down the<br class="">cluster on the primary server: "pcs cluster stop"<br class="">...and nothing happens. The master shuts down (uncleanly, I might add -<br class="">it leaves behind a lock file that prevents it from starting again until<br class="">I manually remove said lock file), but the slave is never promoted to <br class=""></blockquote><br class="">This definitely needs to be corrected. What creates the lock file, and<br class="">how is that entity managed? <br class=""></blockquote><br class="">The lock file entity is created/managed by the postgresql process itself.<br class="">On launch, postgres creates the lock file to say it is running, and<br class="">deletes said lock file when it shuts down. To my understanding, its role<br class="">in life is to prevent a restart after an unclean shutdown so the admin is<br class="">reminded to make sure that the data is in a consistent state before<br class="">starting the server again. <br class=""></blockquote><br class="">What is the name of this lock file? Where is it?<br class=""><br class="">PostgreSQL does not create lock file. It creates a "postmaster.pid" file,<br class="">but it does not forbid a startup if the new process doesn't find another<br class="">process with the pid and shm shown in the postmaster.pid.<br class=""><br class="">As far as I know, the pgsql resource agent create such a lock file on<br class="">promote and delete it on graceful stop. If the PostgreSQL instance couldn't<br class="">be stopped correctly, the lock files stays and the RA refuse to start it<br class="">the next time. <br class=""></blockquote><br class="">Ah, you're right. Looking auth the RA I see where it creates the file in<br class="">question. The delete appears to be in the pgsql_real_stop() function (which<br class="">makes sense), wrapped in an if block that checks for $1 being master and<br class="">$OCF_RESKEY_CRM_meta_notify_slave_uname being a space. Throwing a little<br class="">debugging code in there I see that when it hits that block on a cluster stop,<br class="">$OCF_RESKEY_CRM_meta_notify_slave_uname is <a href="http://centtest1.ravnalaska.net" class="">centtest1.ravnalaska.net</a><br class=""><<a href="http://centtest1.ravnalaska.net/" class="">http://centtest1.ravnalaska.net/</a>>, not a space, so the lock file is not<br class="">removed:<br class=""><br class=""> if [ "$1" = "master" -a "$OCF_RESKEY_CRM_meta_notify_slave_uname" = " "<br class="">]; then ocf_log info "Removing $PGSQL_LOCK."<br class=""> rm -f $PGSQL_LOCK<br class=""> fi <br class=""><br class="">It doesn't look like there is anywhere else where the file would be removed.<br class=""></blockquote><br class="">This is quite wrong to me for two reasons (I'll try to be clear):<br class=""><br class="">1) the resource agent (RA) make sure the timeline (TL) will not be incremented<br class="">during promotion.<br class=""><br class="">As there is no documentation about that, I'm pretty sure this contortion comes<br class="">from limitations in very old versions of PostgreSQL (<= 9.1):<br class=""><br class=""> * a slave wasn't able to cross a timeline (TL) from streaming replication,<br class=""> only from WAL archives. That means crossing a TL was requiring to restart<br class=""> the slave or cutting the streaming rep temporary to force it to get back to<br class=""> the archives<br class=""> * moreover, it was possible a standby miss some transactions on after a clean<br class=""> master shutdown. That means the old master couldn't get back to the<br class=""> cluster as a slave safely, as the TL is still the same...<br class=""><br class="">See slide 35->37: <a href="http://www.slideshare.net/takmatsuo/2012929-pg-study-16012253" class="">http://www.slideshare.net/takmatsuo/2012929-pg-study-16012253</a><br class=""><br class="">In my understanding, that's why we make sure there's no slave around before<br class="">shutting down the master: should the master go back later cleanly, we make sure<br class="">no one could be promoted in the meantime.<br class=""><br class="">Note that considering this issue and how the RA tries to avoid it, this test on<br class="">slave being shutdown before master is quite weak anyway...<br class=""><br class="">Last but not least, the two PostgreSQL limitations the RA is messing with have<br class="">been fixed long time ago in 9.3:<br class=""> * <a href="https://www.postgresql.org/docs/current/static/release-9-3.html#AEN138909" class="">https://www.postgresql.org/docs/current/static/release-9-3.html#AEN138909</a><br class=""> *<br class=""><a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459" class="">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459</a><br class=""><br class="">...but it requires PostgreSQL 9.3+ for the timeline issue. By the way, I suspect<br class="">this is related to the "restart_on_promote" parameter of the RA.<br class=""><br class="">2) from a recent discussion on this list (or maybe on -dev), RA devs should not<br class="">rely on OCF_RESKEY_CRM_meta_notify_* vars outside of "notify" actions.<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">[...] <br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">What can I do to fix this? What troubleshooting steps can I follow?<br class="">Thanks. <br class=""></blockquote></blockquote></blockquote><br class="">I can not find the result of the stop operation in your log files, maybe the<br class="">log from CentTest2 would be more useful. <br class=""></blockquote><br class="">Sure. I was looking at centtest1 because I was trying to figure out why it<br class="">wouldn't promote, but if centtest2 never really stopped (properly) that could<br class="">explain things. Here's the log from 2 when calling pcs cluster stop:<br class=""><br class="">[log log log]<br class=""></blockquote><br class="">Well tis is a normal shutdown and the master was shutdown cleanly. AS you<br class="">pointed out, the lock file stayed there because some slaves were still up.<br class=""><br class="">I **guess** if you really want a shutdown to occurs, you need to simulate a real<br class="">failure, not shutting down the first node cleanly. Try to kill corosync.<br class=""></div></div></blockquote><div><br class=""></div><div>From an academic standpoint the result of that test (which, incidentally, were the same as the results of every other test I've done) are interesting, however from a practical standpoint I'm not sure it helps much - most of the "failures" that I experience are intentional: I want to fail over to the other machine so I can run some software updates, reboot for whatever reason, shutdown temporarily to upgrade the hardware, or whatever. While handling "real" failures is, of course, the real purpose of HA, that type of failure should be pretty rare. I would hope :-)</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">but I can find this:<br class=""><br class=""> Oct 13 08:29:41 CentTest1 pengine[30095]: notice: Scheduling Node<br class=""> <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a> for shutdown<br class=""> ...<br class=""> Oct 13 08:29:41 CentTest1 pengine[30095]: notice: Scheduling Node<br class=""> <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a> for shutdown<br class=""><br class="">Which means the stop operation probably raised an error, leading to a<br class="">fencing of the node. In this circumstance, I bet PostgreSQL wasn't able to<br class="">stop correctly and the lock file stayed in place.<br class=""><br class="">Could you please show us your full cluster setup? <br class=""></blockquote><br class="">Sure: how? pcs status shows this, but I suspect that's not what you are<br class="">asking about:<br class=""></blockquote><br class="">"pcs config" would do the trick.<br class=""></div></div></blockquote><div><br class=""></div><div>Here we go:</div><div><br class=""></div><div>Cluster Name: cluster_test<br class="">Corosync Nodes:<br class=""> <a href="http://centtest1.ravnalaska.net" class="">centtest1.ravnalaska.net</a> <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a><br class="">Pacemaker Nodes:<br class=""> <a href="http://centtest1.ravnalaska.net" class="">centtest1.ravnalaska.net</a> <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a><br class=""><br class="">Resources:<br class=""> Resource: virtual_ip (class=ocf provider=heartbeat type=IPaddr2)<br class=""> Attributes: ip=10.211.55.200 iflabel=pg0<br class=""> Operations: start interval=0s timeout=20s (virtual_ip-start-interval-0s)<br class=""> stop interval=0s timeout=20s (virtual_ip-stop-interval-0s)<br class=""> monitor interval=30s (virtual_ip-monitor-interval-30s)<br class=""> Master: msPostgresql<br class=""> Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true <br class=""> Resource: pgsql_96 (class=ocf provider=heartbeat type=pgsql)<br class=""> Attributes: pgctl=/usr/pgsql-9.6/bin/pg_ctl logfile=/var/log/pgsql/test2.log psql=/usr/pgsql-9.6/bin/psql pgdata=/pgsql96/data rep_mode=async repuser=postgres node_list="<a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a> <a href="http://centest1.ravnalaska.net" class="">centest1.ravnalaska.net</a>" master_ip=10.211.55.200 archive_cleanup_command= restart_on_promote=true replication_slot_name=centtest_2_slot monitor_user=postgres monitor_password=SuperSecret<br class=""> Operations: start on-fail=restart interval=0s timeout=60s (pgsql_96-start-interval-0s)<br class=""> monitor on-fail=restart interval=4s timeout=60s (pgsql_96-monitor-interval-4s)<br class=""> monitor interval=3s role=Master timeout=60s on-fail=restart (pgsql_96-monitor-interval-3s)<br class=""> promote on-fail=restart interval=0s timeout=60s (pgsql_96-promote-interval-0s)<br class=""> demote on-fail=stop interval=0s timeout=60s (pgsql_96-demote-interval-0s)<br class=""> stop on-fail=block interval=0s timeout=60s (pgsql_96-stop-interval-0s)<br class=""> notify interval=0s timeout=60s (pgsql_96-notify-interval-0s)<br class=""><br class="">Stonith Devices:<br class="">Fencing Levels:<br class=""><br class="">Location Constraints:<br class=""> Resource: virtual_ip<br class=""> Enabled on: <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a> (score:50) (id:location-virtual_ip-<a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a>-50)<br class="">Ordering Constraints:<br class=""> promote msPostgresql then start virtual_ip (score:INFINITY) (non-symmetrical) (id:order-msPostgresql-virtual_ip-INFINITY)<br class=""> demote msPostgresql then stop virtual_ip (score:0) (non-symmetrical) (id:order-msPostgresql-virtual_ip-0)<br class="">Colocation Constraints:<br class=""> virtual_ip with msPostgresql (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-virtual_ip-msPostgresql-INFINITY)<br class=""><br class="">Resources Defaults:<br class=""> No defaults set<br class="">Operations Defaults:<br class=""> No defaults set<br class=""><br class="">Cluster Properties:<br class=""> cluster-infrastructure: cman<br class=""> dc-version: 1.1.14-8.el6_8.1-70404b0<br class=""> have-watchdog: false<br class=""> last-lrm-refresh: 1476461302<br class=""> maintenance-mode: false<br class=""> no-quorum-policy: ignore<br class=""> stonith-enabled: false<br class="">Node Attributes:<br class=""> <a href="http://centtest1.ravnalaska.net" class="">centtest1.ravnalaska.net</a>: pgsql_96-data-status=DISCONNECT<br class=""> <a href="http://centtest2.ravnalaska.net" class="">centtest2.ravnalaska.net</a>: pgsql_96-data-status=LATEST</div><div><br class=""></div><div><br class=""></div><div>I find this line particularly interesting: <a href="http://centtest1.ravnalaska.net" class="">centtest1.ravnalaska.net</a>: pgsql_96-data-status=DISCONNECT, especially since it is completely wrong. centtest1 *is* connected and replicating. Does potentially explain some things though.</div><div><br class=""></div>Thanks</div><div><br class=""></div><div><div style="text-align: -webkit-auto; font-variant-ligatures: normal; font-variant-position: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; line-height: normal; orphans: 2; widows: 2; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">-----------------------------------------------<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">Israel Brewster<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">Systems Analyst II<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">Ravn Alaska<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">5245 Airport Industrial Rd<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">Fairbanks, AK 99709<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">(907) 450-7293<o:p class=""></o:p></span></div></div><div style="font-family: Helvetica, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">-----------------------------------------------</span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class=""></span></div></div></div><blockquote type="cite" class=""><div class=""><div class=""><br class=""><br class="">-- <br class="">Jehan-Guillaume de Rorthais<br class="">Dalibo<br class=""></div></div></blockquote></div><br class=""></body></html>