<div dir="ltr">Thank you, useful advice!<div><br></div><div>Logs are attached, they cover the period between when I set maintenance-mode=false till after the node fencing.</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 14 Jun 2019 at 12:48, Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com">jgdr@dalibo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

On Fri, 14 Jun 2019 12:27:12 +0200<br>

Tiemen Ruiten <<a href="mailto:t.ruiten@rdmedia.com" target="_blank">t.ruiten@rdmedia.com</a>> wrote:<br>

> I setup a new 3-node PostgreSQL cluster with HA managed by PAF. Nodes are<br>

> named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing<br>

> archive files to an NFS share that's mounted on all nodes using pgBackRest.<br>

> <br>

> What I did:<br>

> - Create a pacemaker cluster, cib.xml is attached.<br>

> - Set maintenance-mode=true in pacemaker<br>

<br>

This is not required. Just build your PgSQL replication, shut down the<br>

instances, then add the PAF resource to the cluster.<br>

<br>

But it's not very important here.<br>

<br>

> - Bring up ph-sql-03 with pg_ctl start<br>

> - Take a pg_basebackup on ph-sql-04 and ph-sql-05<br>

> - Create a recovery.conf on ph-sql-04 and ph-sql-05:<br>

> <br>

> standby_mode = 'on'<br>

> primary_conninfo = 'user=replication password=XXXXXXXXXXXXXXXX<br>

> application_name=ph-sql-0x host=10.100.130.20 port=5432 sslmode=prefer<br>

> sslcompression=0 krbsrvname=postgres target_session_attrs=any'<br>

> recovery_target_timeline = 'latest'<br>

> restore_command = 'pgbackrest --stanza=pgdb2 archive-get %f "%p"'<br>

<br>

Sounds fine.<br>

<br>

> - Bring up ph-sql-04 and ph-sql-05 and let recovery finish<br>

> - Set maintenance-mode=false in pacemaker<br>

> - Cluster is now running with ph-sql-03 as master and ph-sql-04/5 as slaves<br>

> At this point I tried a manual failover:<br>

> - pcs resource move --wait --master pgsql-ha ph-sql-04<br>

> Contrary to my expectations, pacemaker attempted to stop psqld on<br>

> ph-sql-03.<br>

<br>

Indeed. PostgreSQL doesn't support hot-demote. It has to be shut downed and<br>

started as a standby.<br>

<br>

> This took longer than the configured timeout of 60s (checkpoint<br>

> hadn't completed yet) and the node was fenced.<br>

<br>

60s of checkpoint during a maintenance window? That's important indeed. I would<br>

command doing a manual checkpoint before triggering the move/switchover.<br>

<br>

> Then I ended up with<br>

> ph-sql-04 and ph-sql-05 both in slave mode and ph-sql-03 rebooting.<br>

> <br>

>  Master: pgsql-ha<br>

>   Meta Attrs: notify=true<br>

>   Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)<br>

>    Attributes: bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/11/data<br>

> recovery_template=/var/lib/pgsql/recovery.conf.pcmk<br>

>    Operations: demote interval=0s timeout=30s (pgsqld-demote-interval-0s)<br>

>                methods interval=0s timeout=5 (pgsqld-methods-interval-0s)<br>

>                monitor interval=15s role=Master timeout=10s<br>

> (pgsqld-monitor-interval-15s)<br>

>                monitor interval=16s role=Slave timeout=10s<br>

> (pgsqld-monitor-interval-16s)<br>

>                notify interval=0s timeout=60s (pgsqld-notify-interval-0s)<br>

>                promote interval=0s timeout=30s (pgsqld-promote-interval-0s)<br>

>                reload interval=0s timeout=20 (pgsqld-reload-interval-0s)<br>

>                start interval=0s timeout=60s (pgsqld-start-interval-0s)<br>

>                stop interval=0s timeout=60s (pgsqld-stop-interval-0s)<br>

> <br>

> I understand I should at least increase the timeout of the stop operation<br>

> for psqld, though I'm not sure how much. Checkpoints can take up to 15<br>

> minutes to complete on this cluster. So is 20 minutes reasonable? <br>

<br>

20 minutes is not reasonable for HA. 2 minutes is for manual procedure.<br>

Timeout are here so the cluster knows how to react during unexpected failure.<br>

Not during maintenance.<br>

<br>

As I wrote, just add a manual checkpoint in your switchover procedure before<br>

the actual move.<br>

<br>

> Any other operations I should increase the timeouts for?<br>

> <br>

> Why didn't pacemaker elect and promote one of the other nodes?<br>

<br>

Do you have logs of all nodes during this time period?<br>

<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Tiemen Ruiten<br>Systems Engineer<br>R&D Media<br></div></div>