[ClusterLabs] Replicated PGSQL woes

Thu Oct 13 19:04:18 CEST 2016

Summary: Two-node cluster setup with latest pgsql resource agent. Postgresql starts initially, but failover never happens.

Details:

I'm trying to get a cluster set up with Postgresql 9.6 in a streaming replication using named slots scenario. I'm using the latest pgsql Resource Agent, which does appear to support the named replication slot feature, and I've pulled in the various utility functions the RA uses that weren't available in my base install, so the RA itself no longer gives me errors.

Setup: Two machines, centtest1 and centtest2. Both are running CentOS 6.8. Centtest1 has an IP of 10.211.55.100, and centtest2 has an IP of 10.211.55.101. The cluster is set up and functioning, with a shared virtual IP resource at 10.211.55.200. Postgresql has been set up and tested functioning properly on both nodes with centtest1 as the master and centtest2 as the streaming replica slave. 

I then set up the postgresql master/slave resource using the following commands:

pcs resource create pgsql_96 pgsql \
pgctl="/usr/pgsql-9.6/bin/pg_ctl" \
logfile="/var/log/pgsql/test2.log" \
psql="/usr/pgsql-9.6/bin/psql" \
pgdata="/pgsql96/data" \
rep_mode="async" \
repuser="postgres" \
node_list="tcentest1.ravnalaska.net centtest2.ravnalaska.net" \
master_ip="10.211.55.200" \
archive_cleanup_command="" \
restart_on_promote="true" \
replication_slot_name="centtest_2_slot" \
monitor_user="postgres" \
monitor_password="SuperSecret" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="4s" on-fail="restart" \
op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \
op promote timeout="60s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail=stop \
op stop timeout="60s" interval="0s" on-fail="block" \
op notify timeout="60s" interval="0s";

pcs resource master msPostgresql pgsql_96 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

pcs constraint colocation add virtual_ip with Master msPostgresql INFINITY
pcs constraint order promote msPostgresql then start virtual_ip symmetrical=false score=INFINITY
pcs constraint order demote  msPostgresql then stop  virtual_ip symmetrical=false score=0

My preference would be that the master runs on centtest1, so I add the following constraint as well:

pcs constraint location --master msPostgresql prefers centtest1.ravnalaska.net=50

When I then start the cluster, I first see *both* machines come up as "slave", which I feel is somewhat odd, however the cluster software quickly figures things out and promotes centtest2 to master. I've tried this a dozen different times, and it *always* promotes centtest2 to master - even if I put INFINITY in for the location constraint.

But whatever- this is a cluster, it doesn't really matter which node things are running on, as long as they are running. So the cluster is working - postgresql starts, the master process is on the same node as the IP, you can connect, etc, everything looks good. Obviously the next thing to try is failover - should the master node fail, the slave node should be promoted to master. So I try testing this by shutting down the cluster on the primary server: "pcs cluster stop"
...and nothing happens. The master shuts down (uncleanly, I might add - it leaves behind a lock file that prevents it from starting again until I manually remove said lock file), but the slave is never promoted to master. Neither pcs status or crm_mon show any errors, but centtest1 never becomes master.

If instead of stoping the cluster on centtest2, I try to simply move the master using the command "pcs resource move --master msPostgresql", I first run into the aforementioned unclean shutdown issue (lock file left behind that has to be manually removed), and after removing the lock file, I wind up with *both* nodes being slaves, and no master node. "pcs resource clear --master msPostgresql" re-promotes centtest2 to master.

What it looks like is that for some reason pacemaker/corosync is absolutely refusing to ever make centtest1 a master - even when I explicitly tell it to, or when it is the only node left.

Looking at the messages log when I do the node shutdown test I see this:

Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: On loss of CCM Quorum: Ignore
Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: Stop    virtual_ip#011(centtest2.ravnalaska.net)
Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: Demote  pgsql_96:0#011(Master -> Stopped centtest2.ravnalaska.net)
Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: Calculated Transition 193: /var/lib/pacemaker/pengine/pe-input-500.bz2
Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 43: notify pgsql_96_pre_notify_demote_0 on centtest2.ravnalaska.net
Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 45: notify pgsql_96_pre_notify_demote_0 on centtest1.ravnalaska.net (local)
Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Operation pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net, call=230, rc=0, cib-update=0, confirmed=true)
Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 6: demote pgsql_96_demote_0 on centtest2.ravnalaska.net
Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Transition aborted by status-centtest2.ravnalaska.net-master-pgsql_96, master-pgsql_96=-INFINITY: Transient attribute change (modify cib=0.170.19, source=abort_unless_down:329, path=/cib/status/node_state[@id='centtest2.ravnalaska.net']/transient_attributes[@id='centtest2.ravnalaska.net']/instance_attributes[@id='status-centtest2.ravnalaska.net']/nvpair[@id='status-centtest2.ravnalaska.net-master-pgsql_96'], 0)
Oct 13 08:29:39 CentTest1 crmd[30096]:  warning: No match for shutdown action on centtest2.ravnalaska.net
Oct 13 08:29:40 CentTest1 crmd[30096]:   notice: Initiating action 44: notify pgsql_96_post_notify_demote_0 on centtest2.ravnalaska.net
Oct 13 08:29:40 CentTest1 crmd[30096]:   notice: Initiating action 46: notify pgsql_96_post_notify_demote_0 on centtest1.ravnalaska.net (local)
Oct 13 08:29:41 CentTest1 pgsql(pgsql_96)[19150]: INFO: My master baseline : 000000000700B5C8.
Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Sending flush op to all hosts for: pgsql_96-master-baseline (000000000700B5C8)
Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Sent update 227: pgsql_96-master-baseline=000000000700B5C8
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net, call=231, rc=0, cib-update=0, confirmed=true)
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 193 (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=11, Source=/var/lib/pacemaker/pengine/pe-input-500.bz2): Stopped
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: On loss of CCM Quorum: Ignore
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Stop    virtual_ip#011(centtest2.ravnalaska.net)
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Stop    pgsql_96:0#011(centtest2.ravnalaska.net)
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Calculated Transition 194: /var/lib/pacemaker/pengine/pe-input-501.bz2
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 4: stop virtual_ip_stop_0 on centtest2.ravnalaska.net
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 38: notify pgsql_96_pre_notify_stop_0 on centtest2.ravnalaska.net
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 39: notify pgsql_96_pre_notify_stop_0 on centtest1.ravnalaska.net (local)
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net, call=232, rc=0, cib-update=0, confirmed=true)
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 5: stop pgsql_96_stop_0 on centtest2.ravnalaska.net
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 40: notify pgsql_96_post_notify_stop_0 on centtest1.ravnalaska.net (local)
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net, call=233, rc=0, cib-update=0, confirmed=true)
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 194 (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-501.bz2): Complete
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: On loss of CCM Quorum: Ignore
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Scheduling Node centtest2.ravnalaska.net for shutdown
Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Calculated Transition 195: /var/lib/pacemaker/pengine/pe-input-502.bz2
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 195 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-502.bz2): Complete
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: do_shutdown of centtest2.ravnalaska.net (op 30) is complete
Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: crm_update_peer_proc: Node centtest2.ravnalaska.net[2] - state is now lost (was member)
Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Removing centtest2.ravnalaska.net/2 from the membership list
Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Purged 1 peers with id=2 and/or uname=centtest2.ravnalaska.net from the membership cache
Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice: crm_update_peer_proc: Node centtest2.ravnalaska.net[2] - state is now lost (was member)
Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice: Removing centtest2.ravnalaska.net/2 from the membership list
Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice: Purged 1 peers with id=2 and/or uname=centtest2.ravnalaska.net from the membership cache
Oct 13 08:29:41 CentTest1 cib[30091]:   notice: crm_update_peer_proc: Node centtest2.ravnalaska.net[2] - state is now lost (was member)
Oct 13 08:29:41 CentTest1 cib[30091]:   notice: Removing centtest2.ravnalaska.net/2 from the membership list
Oct 13 08:29:41 CentTest1 cib[30091]:   notice: Purged 1 peers with id=2 and/or uname=centtest2.ravnalaska.net from the membership cache
Oct 13 08:29:43 CentTest1 pgsql(pgsql_96)[19472]: INFO: Master does not exist.
Oct 13 08:29:43 CentTest1 pgsql(pgsql_96)[19472]: WARNING: My data is out-of-date. status=DISCONNECT
Oct 13 08:29:43 CentTest1 corosync[29835]:   [QUORUM] Members[1]: 1
Oct 13 08:29:43 CentTest1 corosync[29835]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 13 08:29:43 CentTest1 corosync[29835]:   [CPG   ] chosen downlist: sender r(0) ip(10.211.55.100) ; members(old:2 left:1)
Oct 13 08:29:43 CentTest1 corosync[29835]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 13 08:29:43 CentTest1 kernel: dlm: closing connection to node 2
Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: cman_event_callback: Node centtest2.ravnalaska.net[2] - state is now lost (was member)
Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: do_shutdown of centtest2.ravnalaska.net (op 30) is complete
Oct 13 08:29:43 CentTest1 pacemakerd[30085]:   notice: cman_event_callback: Node centtest2.ravnalaska.net[2] - state is now lost (was member)
Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending full refresh (origin=crmd)
Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to all hosts for: pgsql_96-status (HS:alone)
Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to all hosts for: shutdown (0)
Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to all hosts for: master-pgsql_96 (-INFINITY)
Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to all hosts for: pgsql_96-master-baseline (000000000700B5C8)
Oct 13 08:29:43 CentTest1 pengine[30095]:   notice: On loss of CCM Quorum: Ignore
Oct 13 08:29:43 CentTest1 pengine[30095]:   notice: Calculated Transition 196: /var/lib/pacemaker/pengine/pe-input-503.bz2
Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: Transition 196 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-503.bz2): Complete
Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 13 08:29:47 CentTest1 pgsql(pgsql_96)[19602]: INFO: Master does not exist.
Oct 13 08:29:47 CentTest1 pgsql(pgsql_96)[19602]: WARNING: My data is out-of-date. status=DISCONNECT
Oct 13 08:29:51 CentTest1 pgsql(pgsql_96)[19730]: INFO: Master does not exist.
Oct 13 08:29:51 CentTest1 pgsql(pgsql_96)[19730]: WARNING: My data is out-of-date. status=DISCONNECT

Those last two lines repeat indefinitely, but there is no indication that the cluster ever tries to promote centtest1 to master. Even if I completely shut down the cluster, and bring it back up only on centtest1, pacemaker refuses to start postgresql on centtest1 as a master.

What can I do to fix this? What troubleshooting steps can I follow? Thanks.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161013/d668add7/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Israel Brewster.vcf
Type: text/directory
Size: 417 bytes
Desc: not available
URL: <http://clusterlabs.org/pipermail/users/attachments/20161013/d668add7/attachment-0001.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161013/d668add7/attachment-0003.html>