<div dir="ltr">Hey folks, <div><br></div><div>Following few battles with the thing - I managed to get pgsql RA to run on 4 nodes, it's all great, however...</div><div>When testing the failover, I unplugged the 'master' machine, the slaves are getting sorted out, new master is elected, however the slaves now don't reconnect to the new master. </div>
<div>They all complain about missing stuff in pg_archive, which I was told to ignore. </div><div style>But they still don't reconnect to the new master to keep the replication going. </div><div style><br></div><div style>
<br></div><div><div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000007000000010000003F': No such file or directory</div><div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000007000000010000003F': No such file or directory</div>
<div>cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000008.history': No such file or directory</div><div>FATAL: timeline 8 of the primary does not match recovery target timeline 7</div><div><br></div></div><div>
<br></div><div style>It's the last line that worries me. </div><div style>Until I run rsync manually to sync up pg_archive with master, it doesn't work anymore. </div><div style><br></div><div style>Not sure where did I go wrong. </div>
<div style><br></div><div style>Here's my crm config:</div><div style><br></div><div style><div>node hanode01 \</div><div> attributes pgsql-data-status="DISCONNECT" kernel="2.6.32-279.el6.x86_64" foobar="barfoo"</div>
<div>node hanode02 \</div><div> attributes pgsql-data-status="DISCONNECT"</div><div>node hanode03 \</div><div> attributes pgsql-data-status="LATEST"</div><div>node hanode04 \</div><div> attributes pgsql-data-status="DISCONNECT"</div>
<div>primitive pgsql ocf:heartbeat:pgsql \</div><div> params pgctl="/usr/pgsql-9.2/bin/pg_ctl" psql="/usr/pgsql-9.2/bin/psql" pgdata="/var/lib/pgsql/9.2/data/" restore_command="cp /var/lib/pgsql/9.2/data/pg_archive/\%f \%p" start_opt="-p 5432" rep_mode="async" node_list="hanode01 hanode02 hanode03 hanode04" master_ip="10.0.1.100" stop_escalate="0" repuser="replicator" monitor_password="lemon31ee7" monitor_user="monitor" \</div>
<div> op start interval="0s" role="Master" timeout="260s" on-fail="restart" \</div><div> op monitor interval="2s" role="Master" timeout="260s" on-fail="restart" \</div>
<div> op monitor interval="7s" timeout="260s" on-fail="restart" \</div><div> op promote interval="0s" timeout="260s" on-fail="restart" \</div><div>
op demote interval="0s" timeout="260s" on-fail="stop" \</div><div> op stop interval="0s" timeout="260s" on-fail="block" \</div><div> op notify interval="0s" timeout="260s"</div>
<div>primitive vip-master ocf:heartbeat:IPaddr2 \</div><div> params ip="10.0.0.100" nic="eth1" cidr_netmask="24" \</div><div> op start interval="0s" timeout="260s" on-fail="restart" \</div>
<div> op monitor interval="10s" timeout="260s" on-fail="restart" \</div><div> op stop interval="0s" timeout="260s" on-fail="block"</div><div>primitive vip-rep ocf:heartbeat:IPaddr2 \</div>
<div> params ip="10.0.1.100" nic="eth2" cidr_netmask="24" \</div><div> op start interval="0s" timeout="260s" on-fail="restart" \</div><div> op monitor interval="10s" timeout="260s" on-fail="restart" \</div>
<div> op stop interval="0s" timeout="260s" on-fail="block"</div><div>group master-group vip-master vip-rep</div><div>ms msPostgresql pgsql \</div><div> meta master-max="1" master-node-max="1" clone-max="10" clone-node-max="1" notify="true" target-role="Master"</div>
<div>colocation rsc_colocation-2 inf: master-group msPostgresql:Master</div><div>order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false</div><div>order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false</div>
<div>property $id="cib-bootstrap-options" \</div><div> dc-version="1.1.9-1512.el6-2a917dd" \</div><div> cluster-infrastructure="classic openais (with plugin)" \</div><div> expected-quorum-votes="4" \</div>
<div> stonith-enabled="false" \</div><div> no-quorum-policy="ignore" \</div><div> last-lrm-refresh="1376582085"</div><div>rsc_defaults $id="rsc_defaults-options" \</div>
<div> resource-stickiness="INFINITY" \</div><div> migration-threshold="5"</div><div><br></div><div><br></div><div><br></div><div style>and postgresql configuration:</div><div style><div>listen_addresses = '*'</div>
<div>wal_level = hot_standby</div><div>synchronous_commit = on</div><div>archive_mode = on</div><div>archive_command = 'cp %p /var/lib/pgsql/9.2/data/pg_archive/%f'</div><div>max_wal_senders=5</div><div>wal_keep_segments = 32</div>
<div>hot_standby = on</div><div>restart_after_crash = off</div><div>replication_timeout = 5000 # mseconds</div><div>wal_receiver_status_interval = 2 # seconds</div><div>max_standby_streaming_delay = -1</div><div>
max_standby_archive_delay = -1</div><div>synchronous_commit = on</div><div>restart_after_crash = off</div><div>hot_standby_feedback = on</div><div><br></div><div><br></div><div style>, pg_hba:</div><div style><br></div><div style>
<div><br></div><div># "local" is for Unix domain socket connections only</div><div>local all all trust</div><div># IPv4 local connections:</div><div>host all all <a href="http://127.0.0.1/32">127.0.0.1/32</a> trust</div>
<div># Allow replication connections from localhost, by a user with the</div><div># replication privilege.</div><div>#local replication postgres peer</div><div>host replication postgres <a href="http://127.0.0.1/32">127.0.0.1/32</a> trust</div>
<div>host replication replicator <a href="http://10.0.0.0/8">10.0.0.0/8</a> trust</div><div>host all all <a href="http://10.0.0.0/8">10.0.0.0/8</a> md5</div>
<div><br></div></div></div><div style><br></div></div><div><br clear="all"><div><br></div>-- <br>GJ
</div></div>