[ClusterLabs] PAF with postgresql 13?

Mon Mar 7 07:31:07 EST 2022

Bonjour,

Je me permets de vous recontacter au sujet du problème de PAF sur Debian avec postgresql 13.
J'ai suivi le guide ci-joint pour la configuration du cluster pacemaker.
La réplication fonctionne bien entre les bases de données Postgresql. Cependant la bascule de VIP et de slave/master sur la ressource pgsqld ne fonctionne pas lors d'un crash d'un des 2 nœuds.

Voici la configuration du cluster:

Resources:

Clone: pgsqld-clone

  Meta Attrs: PGDATA=/var/lib/postgresql/13/main bindir=/usr/lib/postgresql/13/bin promotable=true start_opts="-c config_file=/var/lib/postgresql/13/main/postgresql.conf"

  Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)

   Meta Attrs: master-max=1 notify=true

   Operations: demote interval=0s timeout=120s (pgsqld-demote-interval-0s)

               methods interval=0s timeout=5 (pgsqld-methods-interval-0s)

               monitor interval=15s role=Master timeout=10s (pgsqld-monitor-interval-15s)

               monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)

               notify interval=0s timeout=60s (pgsqld-notify-interval-0s)

               promote interval=0s timeout=30s (pgsqld-promote-interval-0s)

               reload interval=0s timeout=20 (pgsqld-reload-interval-0s)

               start interval=0s timeout=60s (pgsqld-start-interval-0s)

               stop interval=0s timeout=60s (pgsqld-stop-interval-0s)

Resource: VIP (class=ocf provider=heartbeat type=IPaddr2)

  Attributes: cidr_netmask=24 ip=10.1.2.3

  Operations: monitor interval=10s (VIP-monitor-interval-10s)

              start interval=0s timeout=20s (VIP-start-interval-0s)

              stop interval=0s timeout=20s (VIP-stop-interval-0s)

Stonith Devices:

Resource: fence_node1 (class=stonith type=fence_vmware_rest)

  Attributes: ipaddr=10.1.4.3 login=user password=pwd pcmk_host_check=static-list pcmk_host_list=node1 pcmk_reboot_action=reboot ssl=1 ssl_insecure=1

  Operations: monitor interval=60s (fence_node1-monitor-interval-60s)

Resource: fence_node2 (class=stonith type=fence_vmware_rest)

  Attributes: ipaddr=10.1.4.3 login=user password=pwd pcmk_host_check=static-list pcmk_host_list=node2 pcmk_reboot_action=reboot ssl=1 ssl_insecure=1

  Operations: monitor interval=60s (fence_node2-monitor-interval-60s)

Fencing Levels:

Location Constraints:

  Resource: fence_node1

    Disabled on:

      Node: node1 (score:-INFINITY) (id:location-fence_node1-node1--INFINITY)

  Resource: fence_node2

    Disabled on:

      Node: node2 (score:-INFINITY) (id:location-fence_node2-node2--INFINITY)

Ordering Constraints:

  promote pgsqld-clone then start VIP (kind:Mandatory) (non-symmetrical) (id:order-pgsqld-clone-VIP-Mandatory)

  demote pgsqld-clone then stop VIP (kind:Mandatory) (non-symmetrical) (id:order-pgsqld-clone-VIP-Mandatory-1)

Colocation Constraints:

  VIP with pgsqld-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-VIP-pgsqld-clone-INFINITY)

Ticket Constraints:

Alerts:

No alerts defined

Resources Defaults:

  Meta Attrs: rsc_defaults-meta_attributes

    migration-threhold=3

Operations Defaults:

  No defaults set

Cluster Properties:

cluster-infrastructure: corosync

cluster-name: cluster_pgsql

dc-version: 2.0.5-ba59be7122

have-watchdog: false

last-lrm-refresh: 1645462501

no-quorum-policy: ignore

stonith-enabled: true

Lorsque je crée un problème sur le noeud1, le 2e nœud n’arrive pas à récupérer la VIP et le statut master sur la ressource pgsqld, l’erreur est la suivante :

  * pgsqld_promote_0 on node2 'error' (1): call=24, status='complete', exitreason='Can not get current node LSN location', last-rc-change='2022-03-07 13:17:52 +01:00', queued=0ms, exec=183ms

Néanmoins, à la main, j’arrive à promouvoir le noeud en master avec la commande pg_ctl promote, mais le but est bien que ce soit pacemaker qui le fasse.

Auriez-vous une piste à mon problème ?

Merci

-----Message d'origine-----
De : Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
Envoyé : lundi 21 février 2022 17:01
À : CHAMPAGNE Julie <julie.champagne at pm.gouv.fr>
Cc : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Objet : Re: [ClusterLabs] PAF with postgresql 13?

On Mon, 21 Feb 2022 09:04:27 +0000

CHAMPAGNE Julie <julie.champagne at pm.gouv.fr<mailto:julie.champagne at pm.gouv.fr>> wrote:

...

> The last release is 2 years old, is it still in development?

There's no activity because there's not much to do on it. PAF is mainly in maintenance (bug fix) mode.

I have few ideas here and there. It might land soon or later, but nothing really fancy. It just works.

The current effort is to reborn the old workshop that was written few years ago to translate it to english.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220307/d829f903/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAF_PGDayFR_2018-06-26.pdf
Type: application/pdf
Size: 167234 bytes
Desc: PAF_PGDayFR_2018-06-26.pdf
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220307/d829f903/attachment-0001.pdf>