[ClusterLabs] PostgreSQL cluster with Pacemaker+PAF problems

Thu Mar 5 06:40:56 EST 2020

Hello,

On Thu, 5 Mar 2020 12:21:14 +0100
Aleksandra C <aleksandra29c at gmail.com> wrote:
[...]
> I would be very happy to use some help from you.
> 
> I have configured PostgreSQL cluster with Pacemaker+PAF. The pacemaker
> configuration is the following (from
> https://clusterlabs.github.io/PAF/Quick_Start-CentOS-7.html)
> 
> # pgsqld
> pcs -f cluster1.xml resource create pgsqld ocf:heartbeat:pgsqlms \
>     bindir=/usr/pgsql-9.6/bin pgdata=/var/lib/pgsql/9.6/data     \
>     op start timeout=60s                                         \
>     op stop timeout=60s                                          \
>     op promote timeout=30s                                       \
>     op demote timeout=120s                                       \
>     op monitor interval=15s timeout=10s role="Master"            \
>     op monitor interval=16s timeout=10s role="Slave"             \
>     op notify timeout=60s

If you can, I would recommend using PostgreSQL v11 or v12. Support for v12 is in
PAF 2.3rc2 which is supposed to be released next week.

[...]
> The cluster is behaving in strange way. When I manually fence the master
> node (or ungracefully shutdown), after unfencing/starting, the node has
> status Failed/blocked and the node is constantly fenced(restarted) by the
> fencing agent. Should the fencing recover the cluster as Master/Slave
> without problem?

I suppose a failover occurred after the ungraceful shutdown? The old primary is
probably seen as crashed from PAF point of view.

Could you share pgsqlms detailed log?

[...]
> Is this a cluster misconfiguration? Any idea would be greatly appreciated.

I don't think so. Make sure to look at
https://clusterlabs.github.io/PAF/administration.html#failover

Regards,