[ClusterLabs] Postgres Cluster PAF problems

damiano giuliani damianogiuliani87 at gmail.com
Wed Jun 30 07:44:28 EDT 2021


Hi Guys,

sorry for bothering, unfortunally i was called for an issue related to a
cluster i did months ago which was fully functional till last saturday.

looks some applications lost connection to the master losing some
update/insert.

i found the cause into the logs, the psqld-monitor went timeout after
10000ms and the master resource been demote, the instance stopped and then
promoted to master again, generating few seconds of disservices (no master
during the described process)

i noticed a redundant info:
Update score of "ltaoperdbsXX" from 990 to 1000 because of a change in the
replication lag
seems some kind of network lag?

the network should be 10gbs where both corosync and prod network insist.
netkwork bonding on all of the nodes.
PAF version resource-agents-paf-2.3.0-1.rhel7.noarch
Postgres psql (13.1)
pacemaker-1.1.23-1.el7.x86_64
pcs-0.9.169-3.el7.centos.x86_64

i attached the log could be useful to dig further.
Can some guys point me on the right direction, should be really appreciate.

thanks for the support
Pepe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210630/bf9263cf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log-20210630.gz
Type: application/x-gzip
Size: 28532 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210630/bf9263cf/attachment-0001.bin>


More information about the Users mailing list