<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com">jgdr@dalibo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I should have step up to this thread, sorry :)<br></blockquote><div><br></div><div>Really appreciate all the assistance so far.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

The real problem is not how much xact you will lost during failover, but how we<br>

can choose the best standby to elect. This election needs the timeline and LSN<br>

location of all standbys. And today, to fetch te timeline, we must issue a<br>

CHECKPOINT, then read the controldata file.<br>

<br>

I dig in xlog.c today. Maybe I can write a small extension to get the timeline<br>

from shared memory directly and make pgsqlms use it if it detects it. So people<br>

can decide if they feel like it is too invasive or really needed for<br>

their usecase. Maybe in next release. What do you think? Would it be useful to<br>

you?<br></blockquote><div><br></div><div>Yes, that would be a really useful addition IMO. I would definitely use it. If we can avoid taking a checkpoint that will save precious minutes during a failover and the risk of timeouts would be drastically reduced. Would be happy to test it if you want!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> I managed to improve the average time checkpoints are taking already from<br>

> what I mentioned in that thread, mainly by decreasing checkpoint_timeout<br>

> and setting full_page_writes = off; ostensibly not necessary on ZFS.<br>

<br>

The "full_page_writes" helps lowering the amount of WAL produced. Not the<br>

amount of writes to sync during the checkpoint. But I am sure it helps for your<br>

performances :)<br></blockquote><div><br></div><div>If I'm saturating the IO capacity of my system during a forced checkpoint and full_page_writes = off reduces IO by reducing the amount of WAL, then it should help in an indirect way?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Lowering "checkpoint_timeout" probably helps. As checkpoints occur more<br>

frequently, there is statistically less data to sync when a forced checkpoint<br>

happen during a failover.<br>

<br>

Regards,<br>

<br>

</blockquote></div></div>