[ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

Mon Jul 8 16:01:45 EDT 2019

On Mon, 8 Jul 2019 19:27:00 +0200
Tiemen Ruiten <t.ruiten at tech-lab.io> wrote:

> On Mon, Jul 8, 2019 at 4:59 PM Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> wrote:
> 
> > On Mon, 8 Jul 2019 13:56:49 +0200
> > Tiemen Ruiten <t.ruiten at tech-lab.io> wrote:
> >  
> > > Thank you for the clear explanation and advice.
> > >
> > > Hardware is adequate: 8x SSD and 20 cores per node, but I should note  
> > that  
> > > the filesystem is ZFS (stripe of mirrors) and there seems to be evidence
> > > that the way the WAL writer allocates space and ZFS' Copy-on-Write nature
> > > don't play nice. A patch that adds several GUCs to improve the situation  
> >
> > Wait, how better performances on WAL writes will help you there?
> > Checkpoints
> > does not writes to WAL, it actually sync data from shared buffers to data
> > files (heap, toast, index, internal stuffs, etc). Write performances to
> > WAL is
> > related to the number of xact you can achieve per seconds (if you have
> > synchronous_commit >= local), not your checkpoint writes.
> >  
> 
> Wow, I completely misunderstood how that works then. This makes much more
> sense (obviously..).

I think this is what Stephen Frost tried to express without much details here:

https://www.postgresql.org/message-id/20190616203749.GV2480%40tamriel.snowman.net
https://www.postgresql.org/message-id/20190616210925.GW2480%40tamriel.snowman.net

> > > (at least it's worth trying, there was some disagreement on the
> > > pgsql-general list over whether it would be helpful in my situation)  
> >
> > Do you have a link to this thread ?
> >  
> 
> https://www.postgresql.org/message-id/flat/CAEkBuzeno6ztiM1g4WdzKRJFgL8b2nfePNU%3Dq3sBiEZUm-D-sQ%40mail.gmail.com

I should have step up to this thread, sorry :)
The real problem is not how much xact you will lost during failover, but how we
can choose the best standby to elect. This election needs the timeline and LSN
location of all standbys. And today, to fetch te timeline, we must issue a
CHECKPOINT, then read the controldata file.

I dig in xlog.c today. Maybe I can write a small extension to get the timeline
from shared memory directly and make pgsqlms use it if it detects it. So people
can decide if they feel like it is too invasive or really needed for
their usecase. Maybe in next release. What do you think? Would it be useful to
you?

> 
> I managed to improve the average time checkpoints are taking already from
> what I mentioned in that thread, mainly by decreasing checkpoint_timeout
> and setting full_page_writes = off; ostensibly not necessary on ZFS.

The "full_page_writes" helps lowering the amount of WAL produced. Not the
amount of writes to sync during the checkpoint. But I am sure it helps for your
performances :)

Lowering "checkpoint_timeout" probably helps. As checkpoints occur more
frequently, there is statistically less data to sync when a forced checkpoint
happen during a failover.

Regards,