[ClusterLabs] PostgreSQL server timelines offset after promote

Tue Apr 2 12:09:13 EDT 2024

Hi Jehan-Guillaume,

Thanks for your links, but I already had a look at them when using Pacemaker for the first time (except for the last one).
Actually, I forgot to mention that PostgreSQL and pacemaker are run on a Debian GNU/Linux system (latest "Bookworm" release); on reboot, Pacemaker is stopped using "systemctl stop" , and resources *seems* to be migrated correctly to promoted slave.
When the previous master is restarted, Pacemaker is actually restarted automatically and instantly promoted as new master; it *seems* that it's at this moment that a new timeline is sometimes created on backup server...

Don't you think that I should disable automatic restarting of Pacemaker on the master server, and handle promotion manually when a switch occurs?

Best regards,
Thierry

--
Thierry Florac

Resp. Pôle Architecture Applicative et Mobile
DSI - Dépt. Études et Solutions Tranverses
2 bis avenue du Général Leclerc - CS 30042
94704 MAISONS-ALFORT Cedex
Tél : 01 40 19 59 64 - 06 26 53 42 09
www.onf.fr<https://www.onf.fr>

[https://www.ext.onf.fr/img/onf-signature.jpg]

________________________________
De : Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
Envoyé : mercredi 27 mars 2024 09:04
À : FLORAC Thierry <thierry.florac at onf.fr>
Cc : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Objet : Re: [ClusterLabs] PostgreSQL server timelines offset after promote

Bonjour Thierry,

On Mon, 25 Mar 2024 10:55:06 +0000
FLORAC Thierry <thierry.florac at onf.fr> wrote:

> I'm trying to create a PostgreSQL master/slave cluster using streaming
> replication and pgsqlms agent. Cluster is OK but my problem is this : the
> master node is sometimes restarted for system operations, and the slave is
> then promoted without any problem ;

When you have to do some planed system operation, you **must** diligently
ask permission to pacemaker. Pacemaker is the real owner of your resource. It
will react to any unexpected event, even if it's a planed one. You must
consider it as a hidden colleague taking care of your resource.

There's various way to deal with Pacemaker when you need to do some system
maintenance on your primary, it depends on your constraints. Here are two
examples:

* ask Pacemaker to move the "promoted" role to another node
* then put the node in standby mode
* then do your admin tasks
* then unstandby your node: a standby should start on the original node
* optional: move back your "promoted" role to original node

Or:

* put the whole cluster in maintenance mode
* then do your admin tasks
* then check everything works as the cluster expect
* then exit the maintenance mode

The second one might be tricky if Pacemaker find some unexpected status/event
when exiting the maintenance mode.

You can find (old) example of administrative tasks in:

* with pcs: https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html
* with crm: https://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html
* with "low level" commands:
  https://clusterlabs.github.io/PAF/administration.html

These docs updates are long overdue, sorry about that :(

Also, here is an hidden gist (that needs some updates as well):

https://github.com/ClusterLabs/PAF/tree/workshop/docs/workshop/fr

> after reboot, the old master is re-promoted, but I often get an error in
> slave logs :
>
>   FATAL:  la plus grande timeline 1 du serveur principal est derrière la
>           timeline de restauration 2
>
> which can be translated in english to :
>
>   FATAL: the highest timeline 1 of main server is behind restoration timeline
>          2

This is unexpected. I wonder how Pacemaker is being stopped. It is supposed to
stop gracefully its resource. The promotion scores should be updated to reflect
the local resource is not a primary anymore and PostgreSQL should be demoted
then stopped. It is supposed to start as a standby after a graceful shutdown.

Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20240402/ad4e5753/attachment.htm>