[ClusterLabs] Replicated PGSQL woes

Thu Oct 13 13:41:56 EDT 2016

On 10/13/2016 12:04 PM, Israel Brewster wrote:
> Summary: Two-node cluster setup with latest pgsql resource agent.
> Postgresql starts initially, but failover never happens.
> 
> Details:
> 
> I'm trying to get a cluster set up with Postgresql 9.6 in a streaming
> replication using named slots scenario. I'm using the latest pgsql
> Resource Agent, which does appear to support the named replication slot
> feature, and I've pulled in the various utility functions the RA uses
> that weren't available in my base install, so the RA itself no longer
> gives me errors.
> 
> Setup: Two machines, centtest1 and centtest2. Both are running CentOS
> 6.8. Centtest1 has an IP of 10.211.55.100, and centtest2 has an IP of
> 10.211.55.101. The cluster is set up and functioning, with a shared
> virtual IP resource at 10.211.55.200. Postgresql has been set up and
> tested functioning properly on both nodes with centtest1 as the master
> and centtest2 as the streaming replica slave. 
> 
> I then set up the postgresql master/slave resource using the following
> commands:
> 
> pcs resource create pgsql_96 pgsql \
> pgctl="/usr/pgsql-9.6/bin/pg_ctl" \
> logfile="/var/log/pgsql/test2.log" \
> psql="/usr/pgsql-9.6/bin/psql" \
> pgdata="/pgsql96/data" \
> rep_mode="async" \
> repuser="postgres" \
> node_list="tcentest1.ravnalaska.net <http://tcentest1.ravnalaska.net>
> centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>" \
> master_ip="10.211.55.200" \
> archive_cleanup_command="" \
> restart_on_promote="true" \
> replication_slot_name="centtest_2_slot" \
> monitor_user="postgres" \
> monitor_password="SuperSecret" \
> op start timeout="60s" interval="0s" on-fail="restart" \
> op monitor timeout="60s" interval="4s" on-fail="restart" \
> op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \
> op promote timeout="60s" interval="0s" on-fail="restart" \
> op demote timeout="60s" interval="0s" on-fail=stop \
> op stop timeout="60s" interval="0s" on-fail="block" \
> op notify timeout="60s" interval="0s";
> 
> pcs resource master msPostgresql pgsql_96 master-max=1 master-node-max=1
> clone-max=2 clone-node-max=1 notify=true
> 
> pcs constraint colocation add virtual_ip with Master msPostgresql INFINITY
> pcs constraint order promote msPostgresql then start virtual_ip
> symmetrical=false score=INFINITY
> pcs constraint order demote  msPostgresql then stop  virtual_ip
> symmetrical=false score=0
> 
> My preference would be that the master runs on centtest1, so I add the
> following constraint as well:
> 
> pcs constraint location --master msPostgresql prefers
> centtest1.ravnalaska.net=50
> 
> When I then start the cluster, I first see *both* machines come up as
> "slave", which I feel is somewhat odd, however the cluster software
> quickly figures things out and promotes centtest2 to master. I've tried

This is inherent to pacemaker's model of multistate resources. Instances
are always started in slave mode, and then promotion to master is a
separate step.

> this a dozen different times, and it *always* promotes centtest2 to
> master - even if I put INFINITY in for the location constraint.

Surprisingly, location constraints do not directly support limiting to
one role (your "--master" option is ignored, and I'm surprised it
doesn't give an error). To do what you want, you need a rule, like:

pcs constraint location msPostgresql rule \
   role=master score=50 \
   #uname eq centtest1.ravnalaska.net

> But whatever- this is a cluster, it doesn't really matter which node
> things are running on, as long as they are running. So the cluster is
> working - postgresql starts, the master process is on the same node as
> the IP, you can connect, etc, everything looks good. Obviously the next
> thing to try is failover - should the master node fail, the slave node
> should be promoted to master. So I try testing this by shutting down the
> cluster on the primary server: "pcs cluster stop"
> ...and nothing happens. The master shuts down (uncleanly, I might add -
> it leaves behind a lock file that prevents it from starting again until
> I manually remove said lock file), but the slave is never promoted to

This definitely needs to be corrected. What creates the lock file, and
how is that entity managed?

> master. Neither pcs status or crm_mon show any errors, but centtest1
> never becomes master.

I remember a situation where a resource agent improperly set master
scores, that led to no master being promoted. I don't remember the
details, though.

> 
> If instead of stoping the cluster on centtest2, I try to simply move the
> master using the command "pcs resource move --master msPostgresql", I
> first run into the aforementioned unclean shutdown issue (lock file left
> behind that has to be manually removed), and after removing the lock
> file, I wind up with *both* nodes being slaves, and no master node. "pcs
> resource clear --master msPostgresql" re-promotes centtest2 to master.
> 
> What it looks like is that for some reason pacemaker/corosync is
> absolutely refusing to ever make centtest1 a master - even when I
> explicitly tell it to, or when it is the only node left.
> 
> Looking at the messages log when I do the node shutdown test I see this:
> 
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: State transition S_IDLE
> -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: On loss of CCM
> Quorum: Ignore
> Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: Stop 
>   virtual_ip#011(centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>)
> Oct 13 08:29:39 CentTest1 pengine[30095]:   notice:
> Demote  pgsql_96:0#011(Master -> Stopped centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>)
> Oct 13 08:29:39 CentTest1 pengine[30095]:   notice: Calculated
> Transition 193: /var/lib/pacemaker/pengine/pe-input-500.bz2
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 43:
> notify pgsql_96_pre_notify_demote_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 45:
> notify pgsql_96_pre_notify_demote_0 on centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net> (local)
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Operation
> pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net>, call=230, rc=0, cib-update=0,
> confirmed=true)
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Initiating action 6:
> demote pgsql_96_demote_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:39 CentTest1 crmd[30096]:   notice: Transition aborted by
> status-centtest2.ravnalaska.net
> <http://status-centtest2.ravnalaska.net>-master-pgsql_96,
> master-pgsql_96=-INFINITY: Transient attribute change (modify
> cib=0.170.19,
> source=abort_unless_down:329, path=/cib/status/node_state[@id='centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>']/transient_attributes[@id='centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>']/instance_attributes[@id='status-centtest2.ravnalaska.net
> <http://status-centtest2.ravnalaska.net>']/nvpair[@id='status-centtest2.ravnalaska.net
> <http://status-centtest2.ravnalaska.net>-master-pgsql_96'], 0)
> Oct 13 08:29:39 CentTest1 crmd[30096]:  warning: No match for shutdown
> action on centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>
> Oct 13 08:29:40 CentTest1 crmd[30096]:   notice: Initiating action 44:
> notify pgsql_96_post_notify_demote_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:40 CentTest1 crmd[30096]:   notice: Initiating action 46:
> notify pgsql_96_post_notify_demote_0 on centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net> (local)
> Oct 13 08:29:41 CentTest1 pgsql(pgsql_96)[19150]: INFO: My master
> baseline : 000000000700B5C8.
> Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Sending flush op to
> all hosts for: pgsql_96-master-baseline (000000000700B5C8)
> Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Sent update 227:
> pgsql_96-master-baseline=000000000700B5C8
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation
> pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net>, call=231, rc=0, cib-update=0,
> confirmed=true)
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 193
> (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=11,
> Source=/var/lib/pacemaker/pengine/pe-input-500.bz2): Stopped
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: On loss of CCM
> Quorum: Ignore
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Stop 
>   virtual_ip#011(centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>)
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Stop 
>   pgsql_96:0#011(centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>)
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Calculated
> Transition 194: /var/lib/pacemaker/pengine/pe-input-501.bz2
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 4:
> stop virtual_ip_stop_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 38:
> notify pgsql_96_pre_notify_stop_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 39:
> notify pgsql_96_pre_notify_stop_0 on centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net> (local)
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation
> pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net>, call=232, rc=0, cib-update=0,
> confirmed=true)
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 5:
> stop pgsql_96_stop_0 on centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Initiating action 40:
> notify pgsql_96_post_notify_stop_0 on centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net> (local)
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Operation
> pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net
> <http://centtest1.ravnalaska.net>, call=233, rc=0, cib-update=0,
> confirmed=true)
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 194
> (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-501.bz2): Complete
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
> origin=notify_crmd ]
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition S_IDLE
> -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: On loss of CCM
> Quorum: Ignore
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Scheduling Node
> centtest2.ravnalaska.net <http://centtest2.ravnalaska.net> for shutdown
> Oct 13 08:29:41 CentTest1 pengine[30095]:   notice: Calculated
> Transition 195: /var/lib/pacemaker/pengine/pe-input-502.bz2
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: Transition 195
> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-502.bz2): Complete
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
> origin=notify_crmd ]
> Oct 13 08:29:41 CentTest1 crmd[30096]:   notice: do_shutdown of
> centtest2.ravnalaska.net <http://centtest2.ravnalaska.net> (op 30) is
> complete
> Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: crm_update_peer_proc:
> Node centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>[2] -
> state is now lost (was member)
> Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Removing
> centtest2.ravnalaska.net/2 <http://centtest2.ravnalaska.net/2> from the
> membership list
> Oct 13 08:29:41 CentTest1 attrd[30094]:   notice: Purged 1 peers with
> id=2 and/or uname=centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net> from the membership cache
> Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice:
> crm_update_peer_proc: Node centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>[2] - state is now lost (was member)
> Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice: Removing
> centtest2.ravnalaska.net/2 <http://centtest2.ravnalaska.net/2> from the
> membership list
> Oct 13 08:29:41 CentTest1 stonith-ng[30092]:   notice: Purged 1 peers
> with id=2 and/or uname=centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net> from the membership cache
> Oct 13 08:29:41 CentTest1 cib[30091]:   notice: crm_update_peer_proc:
> Node centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>[2] -
> state is now lost (was member)
> Oct 13 08:29:41 CentTest1 cib[30091]:   notice: Removing
> centtest2.ravnalaska.net/2 <http://centtest2.ravnalaska.net/2> from the
> membership list
> Oct 13 08:29:41 CentTest1 cib[30091]:   notice: Purged 1 peers with id=2
> and/or uname=centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>
> from the membership cache
> Oct 13 08:29:43 CentTest1 pgsql(pgsql_96)[19472]: INFO: Master does not
> exist.
> Oct 13 08:29:43 CentTest1 pgsql(pgsql_96)[19472]: WARNING: My data is
> out-of-date. status=DISCONNECT

I'm not familiar with pgsql, but this doesn't look right. The instance
on centtest2 was cleanly demoted and stopped, so I'd expect pgsql to be
consistent at this point.

> Oct 13 08:29:43 CentTest1 corosync[29835]:   [QUORUM] Members[1]: 1
> Oct 13 08:29:43 CentTest1 corosync[29835]:   [TOTEM ] A processor joined
> or left the membership and a new membership was formed.
> Oct 13 08:29:43 CentTest1 corosync[29835]:   [CPG   ] chosen downlist:
> sender r(0) ip(10.211.55.100) ; members(old:2 left:1)
> Oct 13 08:29:43 CentTest1 corosync[29835]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Oct 13 08:29:43 CentTest1 kernel: dlm: closing connection to node 2
> Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: cman_event_callback:
> Node centtest2.ravnalaska.net <http://centtest2.ravnalaska.net>[2] -
> state is now lost (was member)
> Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: do_shutdown of
> centtest2.ravnalaska.net <http://centtest2.ravnalaska.net> (op 30) is
> complete
> Oct 13 08:29:43 CentTest1 pacemakerd[30085]:   notice:
> cman_event_callback: Node centtest2.ravnalaska.net
> <http://centtest2.ravnalaska.net>[2] - state is now lost (was member)
> Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: State transition S_IDLE
> -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
> origin=check_join_state ]
> Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending full refresh
> (origin=crmd)
> Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to
> all hosts for: pgsql_96-status (HS:alone)
> Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to
> all hosts for: shutdown (0)
> Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to
> all hosts for: master-pgsql_96 (-INFINITY)
> Oct 13 08:29:43 CentTest1 attrd[30094]:   notice: Sending flush op to
> all hosts for: pgsql_96-master-baseline (000000000700B5C8)
> Oct 13 08:29:43 CentTest1 pengine[30095]:   notice: On loss of CCM
> Quorum: Ignore
> Oct 13 08:29:43 CentTest1 pengine[30095]:   notice: Calculated
> Transition 196: /var/lib/pacemaker/pengine/pe-input-503.bz2
> Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: Transition 196
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-503.bz2): Complete
> Oct 13 08:29:43 CentTest1 crmd[30096]:   notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
> origin=notify_crmd ]
> Oct 13 08:29:47 CentTest1 pgsql(pgsql_96)[19602]: INFO: Master does not
> exist.
> Oct 13 08:29:47 CentTest1 pgsql(pgsql_96)[19602]: WARNING: My data is
> out-of-date. status=DISCONNECT
> Oct 13 08:29:51 CentTest1 pgsql(pgsql_96)[19730]: INFO: Master does not
> exist.
> Oct 13 08:29:51 CentTest1 pgsql(pgsql_96)[19730]: WARNING: My data is
> out-of-date. status=DISCONNECT
> 
> Those last two lines repeat indefinitely, but there is no indication
> that the cluster ever tries to promote centtest1 to master. Even if I
> completely shut down the cluster, and bring it back up only on
> centtest1, pacemaker refuses to start postgresql on centtest1 as a master.
> 
> What can I do to fix this? What troubleshooting steps can I follow? Thanks.
> 
> 
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------