[ClusterLabs] PostgreSQL cluster with Pacemaker+PAF problems

Thu Mar 5 06:21:14 EST 2020

Hello community,

I would be very happy to use some help from you.

I have configured PostgreSQL cluster with Pacemaker+PAF. The pacemaker
configuration is the following (from
https://clusterlabs.github.io/PAF/Quick_Start-CentOS-7.html)

# pgsqld
pcs -f cluster1.xml resource create pgsqld ocf:heartbeat:pgsqlms \
    bindir=/usr/pgsql-9.6/bin pgdata=/var/lib/pgsql/9.6/data     \
    op start timeout=60s                                         \
    op stop timeout=60s                                          \
    op promote timeout=30s                                       \
    op demote timeout=120s                                       \
    op monitor interval=15s timeout=10s role="Master"            \
    op monitor interval=16s timeout=10s role="Slave"             \
    op notify timeout=60s

# pgsql-ha
pcs -f cluster1.xml resource master pgsql-ha pgsqld notify=true

pcs -f cluster1.xml resource create pgsql-master-ip ocf:heartbeat:IPaddr2 \
    ip=192.168.122.50 cidr_netmask=24 op monitor interval=10s

pcs -f cluster1.xml constraint colocation add pgsql-master-ip with
master pgsql-ha INFINITY
pcs -f cluster1.xml constraint order promote pgsql-ha then start
pgsql-master-ip symmetrical=false kind=Mandatory
pcs -f cluster1.xml constraint order demote pgsql-ha then stop
pgsql-master-ip symmetrical=false kind=Mandatory

I use fence_xvm fencing agent, with the following configuration:

pcs -f cluster1.xml stonith create fence1 fence_xvm
pcmk_host_check="static-list" pcmk_host_list="srv1" port="srv-m1"
multicast_address=224.0.0.2
pcs -f cluster1.xml stonith create fence2 fence_xvm
pcmk_host_check="static-list" pcmk_host_list="srv2" port="srv-m2"
multicast_address=224.0.0.2

pcs -f cluster1.xml constraint location fence1 avoids srv1=INFINITY
pcs -f cluster1.xml constraint location fence2 avoids srv2=INFINITY

The cluster is behaving in strange way. When I manually fence the master
node (or ungracefully shutdown), after unfencing/starting, the node has
status Failed/blocked and the node is constantly fenced(restarted) by the
fencing agent. Should the fencing recover the cluster as Master/Slave
without problem? The error log say that the demote action on the node has
failed:

warning: Action 10 (pgsqld_demote_0) on server1 failed (target: 0 vs. rc:
1): Error
warning: Processing failed op demote for pgsqld:1 on server1: unknown error
(1)
warning: Forcing pgsqld:1 to stop after a failed demote action

Is this a cluster misconfiguration? Any idea would be greatly appreciated.

Thank you in advance,

Aleksandra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200305/84a550a9/attachment.htm>