[ClusterLabs] PAF fails to promote slave: Can not get current node LSN location
t.ruiten at tech-lab.io
Tue Jul 9 07:22:06 EDT 2019
On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> I should have step up to this thread, sorry :)
Really appreciate all the assistance so far.
> The real problem is not how much xact you will lost during failover, but
> how we
> can choose the best standby to elect. This election needs the timeline and
> location of all standbys. And today, to fetch te timeline, we must issue a
> CHECKPOINT, then read the controldata file.
> I dig in xlog.c today. Maybe I can write a small extension to get the
> from shared memory directly and make pgsqlms use it if it detects it. So
> can decide if they feel like it is too invasive or really needed for
> their usecase. Maybe in next release. What do you think? Would it be
> useful to
Yes, that would be a really useful addition IMO. I would definitely use it.
If we can avoid taking a checkpoint that will save precious minutes during
a failover and the risk of timeouts would be drastically reduced. Would be
happy to test it if you want!
> > I managed to improve the average time checkpoints are taking already from
> > what I mentioned in that thread, mainly by decreasing checkpoint_timeout
> > and setting full_page_writes = off; ostensibly not necessary on ZFS.
> The "full_page_writes" helps lowering the amount of WAL produced. Not the
> amount of writes to sync during the checkpoint. But I am sure it helps for
> performances :)
If I'm saturating the IO capacity of my system during a forced checkpoint
and full_page_writes = off reduces IO by reducing the amount of WAL, then
it should help in an indirect way?
> Lowering "checkpoint_timeout" probably helps. As checkpoints occur more
> frequently, there is statistically less data to sync when a forced
> happen during a failover.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users