[ClusterLabs] Help with PostgreSQL Automatic Failover demotion

Fri Feb 18 18:52:00 EST 2022

Hello,

On Fri, 18 Feb 2022 21:44:58 +0000
"Larry G. Mills" <lgmills at fnal.gov> wrote:

> ... This happened again recently, and the running primary DB was demoted and
> then re-promoted to be the running primary. What I'm having trouble
> understanding is why the running Master/primary DB was demoted.  After the
> monitor operation timed out, the failcount for the ha-db resource was still
> less than the configured "migration-threshold", which is set to 5.

Because "migration-threshold" is the limit before the resource is moved away
from the node.

As long as your failcount is less than "migration-threshold" and the failure
is not fatal, the cluster will keep the resource on the same node and try to
"recover" it by running a full restart: demote -> stop -> start -> promote.

Since 2.0, the recover action can be demote -> promote. See the "on-fail"
property and the detail about it below the table:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#operation-properties

Regards,