[ClusterLabs] PostgreSQL cluster with Pacemaker+PAF problems

Fri Mar 6 18:20:37 EST 2020

On Fri, 6 Mar 2020 18:38:37 +0100
Aleksandra C <aleksandra29c at gmail.com> wrote:
[...]
> pgsqlms(pgsqld)[3824]:  Mar 06 18:04:21  WARNING: No secondary
> connected to the master
> pgsqlms(pgsqld)[3824]:  Mar 06 18:04:21  WARNING: "server1" is not
> connected to the primary

Here, pgsqlms is warning you that no standby is replicating with the primary.

[...]
> Here I can see this kind of logs:
> Action 6 (pgsqld_demote_0) on server1 failed (target: 0 vs. rc: 1): Error
> Action 2 (pgsqld_stop_0) on server1 failed (target: 0 vs. rc: 1): Error

error on stop leads to fencing so Pacemaker is sure the instance is really
stopped.

> It seems it can not demote the previous master as a slave. Recovery.conf
> file is present at this (failed) node.

Before every action, PAF is checking the status of the instance. Here, your
instance is probably detected as crashed. So whatever the action, demote or
stop, an error is raised. And pgsqlms is suppose to expose a log message
somewhere about that...

> Should I assume that every ungraceful shutdown scenario (and even manual
> fence) would result with node failover (so I should rebuild the instance)?

As soon as you ask an automate to take care of your instance, you *must* ask
him for any action you need to do manually: start or stop your PostgreSQL.

If you want to shutdown your node, make it gracefully.

  pcs resource disable pgsql-ha --wait
  pcs cluster stop --all
  poweroff

In your sentence, "every ungraceful" is scary. You are not supposed to shutdown
your nodes ungracefully, neither once or often. But if you do need to shutdown
ungracefully, put your cluster in maintenance mode first and clean the mess
after reboot before leaving the maintenance mode. See PAF and Pacemaker docs
about maintenance mode.

To answer your question, yes, after a failover, you must rebuild your old
primary as a standby. See:
https://clusterlabs.github.io/PAF/administration.html#failover