[ClusterLabs] Postgresql+Pacemaker+Corosync unexpected behavior

Mon Oct 2 01:54:38 EDT 2023

Hello!

I have configured Postgresql+Pacemaker+Corosync with 3 nodes, 2 of them
 for Postgresql HA cluster and one as a witness.

3 nodes configured
4 resource instances configured

Online: [ witness wizard1 wizard2 ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started wizard1
Master/Slave Set: mspgsql [pgsql]
Masters: [ wizard1 ]
Slaves: [ wizard2 ]
ExternalIP (ocf::heartbeat:IPaddr2): Started wizard1

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

aptitude versions pacemaker corosync postgresql
Package corosync:
i A 2.4.2-3+deb9u1
                             stable
                                 900

Package pacemaker:
i   1.1.24-0+deb9u1
                            stable
                                 900

Package postgresql:
i A 9.6+200astra8
                            stable
                                 900

After rebooting the slave, it joins to the cluster in this state:

Node Attributes:
* Node witness:
* Node wizard1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000070028D8
+ pgsql-status : PRI
* Node wizard2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async

although at the same time

postgres=# SELECT pid,usename,application_name,state,sync_state FROM
pg_stat_replication;
pid | usename | application_name | state | sync_state
------+----------+------------------+-----------+------------
6569 | postgres | wizard2 | streaming | sync

If I run the command "sudo pcs resource cleanup" on the slave, the cluster
goes into the state

Node Attributes:
* Node witness:
* Node wizard1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000070028D8
+ pgsql-status : PRI
* Node wizard2:
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync

Sometimes, after running "pcs resource cleanup", the value of master-pgsql
remains -INFINITY, in this case, after running "pcs resource cleanup"
again, master-pgsql takes the value 100.

What could be the cause for this behavior and how serious is it in terms of
data security? Postgresql claims that synchronous replication is running.
May I ignore this behavior?

And the second issue: when the slave is rebooted, an entry appears in the
Postgresql log
postgres at template1 FATAL: the database system is starting up
What does this mean?

Best regards,
Sergey Cherukhin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20231002/69b09f9c/attachment.htm>