<div dir="ltr"><div dir="ltr">On Tue, Jul 9, 2019 at 4:21 PM Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com" target="_blank">jgdr@dalibo.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, 9 Jul 2019 13:22:06 +0200<br>

Tiemen Ruiten <<a href="mailto:t.ruiten@tech-lab.io" target="_blank">t.ruiten@tech-lab.io</a>> wrote:<br>

<br>

> On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais <<a href="mailto:jgdr@dalibo.com" target="_blank">jgdr@dalibo.com</a>><br>

...<br>

> > I dig in xlog.c today. Maybe I can write a small extension to get the<br>

> > timeline<br>

> > from shared memory directly and make pgsqlms use it if it detects it. So<br>

> > people<br>

> > can decide if they feel like it is too invasive or really needed for<br>

> > their usecase. Maybe in next release. What do you think? Would it be<br>

> > useful to<br>

> > you?<br>

> >  <br>

> <br>

> Yes, that would be a really useful addition IMO. I would definitely use it.<br>

> If we can avoid taking a checkpoint that will save precious minutes during<br>

> a failover and the risk of timeouts would be drastically reduced. Would be<br>

> happy to test it if you want!<br>

<br>

OK, thanks. Not sure when I'll have time to work on this. But I'll stay in<br>

touch with you then.<br></blockquote><div><br></div><div>Great!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I have to work on the v12 support as well :/<br>

<br>

> > > I managed to improve the average time checkpoints are taking already from<br>

> > > what I mentioned in that thread, mainly by decreasing checkpoint_timeout<br>

> > > and setting full_page_writes = off; ostensibly not necessary on ZFS.  <br>

> ><br>

> > The "full_page_writes" helps lowering the amount of WAL produced. Not the<br>

> > amount of writes to sync during the checkpoint. But I am sure it helps for<br>

> > your performances :)<br>

> <br>

> If I'm saturating the IO capacity of my system during a forced checkpoint<br>

> and full_page_writes = off reduces IO by reducing the amount of WAL, then<br>

> it should help in an indirect way?<br>

<br>

The master is supposed to be gone during a failover, neither in reads or<br>

writes. </blockquote><div><br></div><div>OK, I didn't consider this.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The checkpoint occurs on each standby to force sync their<br>

controldata. The checkpoint itself does not writes to WALs or read them. Am I<br>

forgetting something obvious?<br>

<br>

Maybe you can have some writes if the standby need to sync last received<br>

WALs and some reads if the standby was lagging on replay...But it shouldn't be<br>

much...<br></blockquote><div><br></div><div>I double-checked monitoring data: there was approximately one minute of replication lag on one slave and two minutes of replication lag on the other slave when the original issue occurred. By the way, I'm still seeing worrying amounts of replication lag on both slaves at times (usually not on both at the same time) so that's really puzzling: all hardware and configuration is identical. Anyway, that's something for another thread/mailinglist I suppose :)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

-- <br>

Jehan-Guillaume de Rorthais<br>

Dalibo<br>

</blockquote></div></div>