[ClusterLabs] PAF with postgresql 13?

Mon Mar 7 09:32:46 EST 2022

Hi,

Thanks for your quick answer. Here are my answers:

I tried 2 different tests to create a problem in the cluster:
- killing postgres processes on node1: pkill postgres
- putting node1 in standby in the cluster with the following command: pcs node standby node1
Both gave me the same result regarding the error message with lsn location. In the pacemaker logs:

pgsqlms(pgsqld)[5180]:  Mar 07 15:29:48  ERROR: Can not get current node LSN location
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_op_output)  notice: pgsqld_promote_0[5180] error output [ Could not query value of cancel_switchover-pgsqld: attribute does not exist ]
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_op_output)  notice: pgsqld_promote_0[5180] error output [ Could not query value of recover_master-pgsqld: attribute does not exist ]
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_op_output)  notice: pgsqld_promote_0[5180] error output [ Could not query value of nodes-pgsqld: attribute does not exist ]
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_op_output)  notice: pgsqld_promote_0[5180] error output [ Could not query value of lsn_location-pgsqld: attribute does not exist ]
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_op_output)  notice: pgsqld_promote_0[5180] error output [ ocf-exit-reason:Can not get current node LSN location ]
Mar 07 15:29:48 node2 pacemaker-execd     [692] (log_finished)   info: pgsqld promote (call 233, PID 5180) exited with status 1 (execution time 183ms, queue time 0ms)
Mar 07 15:29:48 node2 pacemaker-controld  [695] (process_lrm_event)      notice: Result of promote operation for pgsqld on node2: error | rc=1 call=233 key=pgsqld_promote_0 confirmed=true cib-update=382
Mar 07 15:29:48 node2 pacemaker-controld  [695] (process_lrm_event)      notice: node2-pgsqld_promote_0:233 [ Could not query value of cancel_switchover-pgsqld: attribute does not exist\nCould not query value of recover_master-pgsqld: attribute does not exist\nCould not query value of nodes-pgsqld: attribute does not exist\nCould not query value of lsn_location-pgsqld: attribute does not exist\nocf-exit-reason:Can not get current node LSN location\n ]

Our versions are Debian 11.2 and PAF 2.3.0 (installed from debian repository).

The attrd_updater command returns:

root at node1 ~ > attrd_updater --private --lifetime reboot --name "lsn_location-pgsqld" --query
Could not query value of lsn_location-pgsqld: attribute does not exist

Thanks !

-----Message d'origine-----
De : Jehan-Guillaume de Rorthais <jgdr at dalibo.com> 
Envoyé : lundi 7 mars 2022 15:15
À : CHAMPAGNE Julie <julie.champagne at pm.gouv.fr>
Cc : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Objet : Re: [ClusterLabs] PAF with postgresql 13?

Hi,

Caution, this is an english spoken mailing list :)

Bellow my answer.

On Mon, 7 Mar 2022 12:31:07 +0000
CHAMPAGNE Julie <julie.champagne at pm.gouv.fr> wrote:

> Lorsque je crée un problème sur le noeud1,

What's the issue you are testing precisely?

>   * pgsqld_promote_0 on node2 'error' (1): call=24, status='complete', 
> exitreason='Can not get current node LSN location',

It seems the agent had some trouble getting some private attributes from the cluster. Could you give exact:

* Debian version
* PAF version

Do you find any error in logs about setting/getting lsn_location attribute ?

What is the result of the following command:

  attrd_updater --private --lifetime reboot --name "lsn_location-pgsqld" --query

Thanks,