[ClusterLabs] PostgreSQL PAF failover issue
Tiemen Ruiten
t.ruiten at rdmedia.com
Fri Jun 14 10:43:23 EDT 2019
Right, so I may have been too fast to give up. I set maintenance mode back
on and promoted ph-sql-04 manually. Unfortunately I don't have the logs of
ph-sql-03 anymore because I reinitialized it.
You mention that demote timeout should be start timeout + stop timeout.
Start/stop are 60s, so that would mean 120s for demote timeout? Or 30s for
start/stop?
On Fri, 14 Jun 2019 at 15:55, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
wrote:
> On Fri, 14 Jun 2019 13:18:09 +0200
> Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:
>
> > Thank you, useful advice!
> >
> > Logs are attached, they cover the period between when I set
> > maintenance-mode=false till after the node fencing.
>
> Switchover started @ 09:51:43
>
> In fact, the action that timed out was the demote action, not the stop
> action:
>
> pgsqld_demote_0:31997 - timed out after 30000ms
>
> As explained, the demote is doing a stop/start because PgSQL doesn't
> support hot
> demotion. So your demote action should be stop timeout+start timeout. I
> would
> recommend 60s there instead of 30s.
>
> After Pacemaker decide what to do next, you had some more timeouts. I
> supose
> PgSQL logs should give some more explanation of what happen during these
> long
> minutes
>
> pgsqld_notify_0:37945 - timed out after 60000ms
> ...
> pgsqld_stop_0:7783 - timed out after 60000ms
>
> It is 09:54:16. Now pengine become angry and want to make sure pgsql is
> stopped
> on node 03:
>
> pengine: warning: unpack_rsc_op_failure: Processing failed stop of
> pgsqld:1
> on ph-sql-03: unknown error | rc=1
> ...
> pengine: warning: pe_fence_node: Cluster node ph-sql-03 will be
> fenced:
> pgsqld:1 failed there
> ...
> pengine: warning: stage6: Scheduling Node ph-sql-03 for STONITH
> ...
> pengine: notice: native_stop_constraints: Stop of failed resource
> pgsqld:1
> is implicit after ph-sql-03 is fenced
>
>
> From there node 03 is down for 9 minutes, it comes back at 10:02:59.
>
> Meanwhile, @ 09:54:29, node 5 took over the DC role and decided to promote
> pgsql
> on node 4 as expected.
>
> The pre-promote notify actions are triggered, but at 09:55:24, the
> transition is canceled because of maintenance mode:
>
> Transition aborted by cib-bootstrap-options-maintenance-mode doing modify
> maintenance-mode=true
>
> Soon after, both notify actions timed out on both nodes:
>
> warning: child_timeout_callback: pgsqld_notify_0 process (PID 38838)
> timed
> out
>
> Not sure what happen on your side that could explain these timeouts, but
> because the cluster was in maintenance mode, there was a human interaction
> ingoing anyway.
>
>
>
>
>
>
> > On Fri, 14 Jun 2019 at 12:48, Jehan-Guillaume de Rorthais <
> jgdr at dalibo.com>
> > wrote:
> >
> > > Hi,
> > >
> > > On Fri, 14 Jun 2019 12:27:12 +0200
> > > Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:
> > > > I setup a new 3-node PostgreSQL cluster with HA managed by PAF.
> Nodes are
> > > > named ph-sql-03, ph-sql-04, ph-sql-05. Archive mode is on and writing
> > > > archive files to an NFS share that's mounted on all nodes using
> > > pgBackRest.
> > > >
> > > > What I did:
> > > > - Create a pacemaker cluster, cib.xml is attached.
> > > > - Set maintenance-mode=true in pacemaker
> > >
> > > This is not required. Just build your PgSQL replication, shut down the
> > > instances, then add the PAF resource to the cluster.
> > >
> > > But it's not very important here.
> > >
> > > > - Bring up ph-sql-03 with pg_ctl start
> > > > - Take a pg_basebackup on ph-sql-04 and ph-sql-05
> > > > - Create a recovery.conf on ph-sql-04 and ph-sql-05:
> > > >
> > > > standby_mode = 'on'
> > > > primary_conninfo = 'user=replication password=XXXXXXXXXXXXXXXX
> > > > application_name=ph-sql-0x host=10.100.130.20 port=5432
> sslmode=prefer
> > > > sslcompression=0 krbsrvname=postgres target_session_attrs=any'
> > > > recovery_target_timeline = 'latest'
> > > > restore_command = 'pgbackrest --stanza=pgdb2 archive-get %f "%p"'
> > >
> > > Sounds fine.
> > >
> > > > - Bring up ph-sql-04 and ph-sql-05 and let recovery finish
> > > > - Set maintenance-mode=false in pacemaker
> > > > - Cluster is now running with ph-sql-03 as master and ph-sql-04/5
> as
> > > slaves
> > > > At this point I tried a manual failover:
> > > > - pcs resource move --wait --master pgsql-ha ph-sql-04
> > > > Contrary to my expectations, pacemaker attempted to stop psqld on
> > > > ph-sql-03.
> > >
> > > Indeed. PostgreSQL doesn't support hot-demote. It has to be shut
> downed and
> > > started as a standby.
> > >
> > > > This took longer than the configured timeout of 60s (checkpoint
> > > > hadn't completed yet) and the node was fenced.
> > >
> > > 60s of checkpoint during a maintenance window? That's important
> indeed. I
> > > would
> > > command doing a manual checkpoint before triggering the
> move/switchover.
> > >
> > > > Then I ended up with
> > > > ph-sql-04 and ph-sql-05 both in slave mode and ph-sql-03 rebooting.
> > > >
> > > > Master: pgsql-ha
> > > > Meta Attrs: notify=true
> > > > Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
> > > > Attributes: bindir=/usr/pgsql-11/bin pgdata=/var/lib/pgsql/11/data
> > > > recovery_template=/var/lib/pgsql/recovery.conf.pcmk
> > > > Operations: demote interval=0s timeout=30s
> (pgsqld-demote-interval-0s)
> > > > methods interval=0s timeout=5
> (pgsqld-methods-interval-0s)
> > > > monitor interval=15s role=Master timeout=10s
> > > > (pgsqld-monitor-interval-15s)
> > > > monitor interval=16s role=Slave timeout=10s
> > > > (pgsqld-monitor-interval-16s)
> > > > notify interval=0s timeout=60s
> (pgsqld-notify-interval-0s)
> > > > promote interval=0s timeout=30s
> > > (pgsqld-promote-interval-0s)
> > > > reload interval=0s timeout=20
> (pgsqld-reload-interval-0s)
> > > > start interval=0s timeout=60s
> (pgsqld-start-interval-0s)
> > > > stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
> > > >
> > > > I understand I should at least increase the timeout of the stop
> operation
> > > > for psqld, though I'm not sure how much. Checkpoints can take up to
> 15
> > > > minutes to complete on this cluster. So is 20 minutes reasonable?
> > >
> > > 20 minutes is not reasonable for HA. 2 minutes is for manual procedure.
> > > Timeout are here so the cluster knows how to react during unexpected
> > > failure.
> > > Not during maintenance.
> > >
> > > As I wrote, just add a manual checkpoint in your switchover procedure
> > > before
> > > the actual move.
> > >
> > > > Any other operations I should increase the timeouts for?
> > > >
> > > > Why didn't pacemaker elect and promote one of the other nodes?
> > >
> > > Do you have logs of all nodes during this time period?
> > >
> > >
> >
>
>
>
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
--
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190614/192957d9/attachment.html>
More information about the Users
mailing list