[ClusterLabs] Antw: [EXT] Postgres Cluster PAF problems

Wed Jun 30 08:17:35 EDT 2021

>>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 30.06.2021 um 13:44
in Nachricht
<CAG=zYNNe=azZaLEhe3JzKaHnSEv88Nr+yEo0m06hLjL4L11PCA at mail.gmail.com>:
> Hi Guys,
> 
> sorry for bothering, unfortunally i was called for an issue related to a
> cluster i did months ago which was fully functional till last saturday.
> 
> looks some applications lost connection to the master losing some
> update/insert.
> 
> i found the cause into the logs, the psqld-monitor went timeout after
> 10000ms and the master resource been demote, the instance stopped and then
> promoted to master again, generating few seconds of disservices (no master
> during the described process)

Well, I think YOU have to find out why the monitor timed out. Maybe the disks being used were too busy, maybe the memory was tight, ...
WE don't know.

> 
> i noticed a redundant info:
> Update score of "ltaoperdbsXX" from 990 to 1000 because of a change in the
> replication lag
> seems some kind of network lag?
> 
> the network should be 10gbs where both corosync and prod network insist.
> netkwork bonding on all of the nodes.
> PAF version resource-agents-paf-2.3.0-1.rhel7.noarch
> Postgres psql (13.1)
> pacemaker-1.1.23-1.el7.x86_64
> pcs-0.9.169-3.el7.centos.x86_64
> 
> i attached the log could be useful to dig further.
> Can some guys point me on the right direction, should be really appreciate.
> 
> thanks for the support
> Pepe