[ClusterLabs] PAF with postgresql 13?

Tue Mar 8 11:20:05 EST 2022

Hi,

Sorry, your mail was really hard to read on my side, but I think I understood
and try to answer bellow.

On Tue, 8 Mar 2022 11:45:30 +0000
lejeczek via Users <users at clusterlabs.org> wrote:

> On 08/03/2022 10:21, Jehan-Guillaume de Rorthais wrote:
> >> op start timeout=60s \ op stop timeout=60s \ op promote timeout=30s  >> \
> >> op demote timeout=120s \ op monitor interval=15s   
> timeout=10s >> role="Master" meta master-max=1 \ op monitor 
> interval=16s >> timeout=10s role="Slave" \ op notify 
> timeout=60s meta notify=true > Because "op" appears, we are 
> back in resource ("pgsqld") context, > anything after is 
> interpreted as ressource and operation attributes, > even 
> the "meta notify=true". That's why your pgsqld-clone doesn't 
>  > have the meta attribute "notify=true" set.  
> Here is one-liner that should do - add, as per 'debug-' 
> suggestion, 'master-max=1'

What debug- suggestion??

...
> then do:
> 
> -> $ pcs resource delete pgsqld  
> 
> '-clone' should get removed too, so now no 'pgsqld' 
> resource(s) but cluster - weirdly in my mind - leaves node 
> attributes on.

indeed.

> I see 'master-pgsqld' with each node and do not see why 
> 'node attributes' should be kept(certainly shown) for 
> non-existent resources(to which only resources those attrs 
> are instinct)
> So, you want to "clean" that for, perhaps for now you are 
> not going to have/use 'pgsqlms', you can do that with:
> 
> -> $ pcs node attribute node1 master-pgsqld="" # same for   
> remaining nodes

indeed.

> now .. ! repeat your one-liner which worked just a moment 
> ago and you should get exact same or similar errors(while 
> all nodes are stuck on 'slave'

You have no promotion because your PostgreSQL instances has been stopped
in standby mode. The cluster has no way and no score to promote one of them.

> -> $ pcs resource debug-promote pgsqld  
> crm_resource: Error performing operation: Error occurred
> Operation force-promote for pgsqld (ocf:heartbeat:pgsqlms) 
> returned 1 (error: Can not get current node LSN location)
> /tmp:5432 - accepting connections

NEVER use "debug-promote" or other "debug-*" command with pgsqlms, or any other
cloned ressources. AFAIK, these commands works fine for "stateless" ressource,
but do not (could not) create the required environnement for the
clone and multi-state ones.

So I repeat, NEVER use "debug-promote".

What you want to do is setting the promotion score on the node you want the
promotion to happen. Eg.:

  pcs node attribute srv1 master-pgsqld=1001

You can use "crm_attribute" or "crm_master" as well.

> ocf-exit-reason:Can not get current node LSN location

This one is probably because of "debug-promote".

> You have to 'cib-push' to "fix" this very problem.
> In my(admin's) opinion this is a 100% candidate for a bug - 
> whether in PCS or PAF - perhaps authors may wish to comment?

Removing the node attributes with the resource might be legit from the
Pacemaker point of view, but I'm not sure how they can track the dependency
(ping Ken?).

PAF has no way to know the ressource is being deleted and can not remove its
node attribute before hand.

Maybe PCS can look for promotable score and remove them during the "resource
delete" command (ping Tomas)?

Regards,