[ClusterLabs] PostgreSQL PAF failover issue

Fri Jun 14 10:53:53 EDT 2019

On Fri, 14 Jun 2019 16:43:23 +0200
Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:

> Right, so I may have been too fast to give up. I set maintenance mode back
> on and promoted ph-sql-04 manually. Unfortunately I don't have the logs of
> ph-sql-03 anymore because I reinitialized it.
> You mention that demote timeout should be start timeout + stop timeout.
> Start/stop are 60s, so that would mean 120s for demote timeout? Or 30s for
> start/stop?

Considering your slow checkpoint, go high until you fixed it.

  demote=120s
  start/stop=60s
  notify=60s

Another good practice would be to setup a centralized log server using
eg rsyslog. It avoid loosing messages during fencing and you can gather all
the logs from all the nodes in one place. See the vagrant files and scripts in
PAF/test/ repository for a demo setup.