[ClusterLabs] PAF with postgresql 13?

Tue Mar 8 06:45:30 EST 2022

On 08/03/2022 10:21, Jehan-Guillaume de Rorthais wrote:
>> op start timeout=60s \ op stop timeout=60s \ op promote timeout=30s  >> \ op demote timeout=120s \ op monitor interval=15s 
timeout=10s >> role="Master" meta master-max=1 \ op monitor 
interval=16s >> timeout=10s role="Slave" \ op notify 
timeout=60s meta notify=true > Because "op" appears, we are 
back in resource ("pgsqld") context, > anything after is 
interpreted as ressource and operation attributes, > even 
the "meta notify=true". That's why your pgsqld-clone doesn't 
 > have the meta attribute "notify=true" set.
Here is one-liner that should do - add, as per 'debug-' 
suggestion, 'master-max=1'

-> $ pcs resource create pgsqld ocf:heartbeat:pgsqlms 
bindir=/usr/bin pgdata=/var/lib/pgsql/data op start 
timeout=60s op stop timeout=60s op promote timeout=30s op 
demote timeout=120s op monitor interval=15s timeout=10s 
role="Master" op monitor interval=16s timeout=10s 
role="Slave" op notify timeout=60s promotable notify=true 
master-max=1 && pcs constraint colocation add HA-10-1-1-226 
with master pgsqld-clone INFINITY && pcs constraint order 
promote pgsqld-clone then start HA-10-1-1-226 
symmetrical=false kind=Mandatory && pcs constraint order 
demote pgsqld-clone then stop HA-10-1-1-226 
symmetrical=false kind=Mandatory

but ... ! this "issue" is reproducible! So now you have 
working 'pgsqlms', then do:

-> $ pcs resource delete pgsqld

'-clone' should get removed too, so now no 'pgsqld' 
resource(s) but cluster - weirdly in my mind - leaves node 
attributes on.
I see 'master-pgsqld' with each node and do not see why 
'node attributes' should be kept(certainly shown) for 
non-existent resources(to which only resources those attrs 
are instinct)
So, you want to "clean" that for, perhaps for now you are 
not going to have/use 'pgsqlms', you can do that with:

-> $ pcs node attribute node1 master-pgsqld="" # same for 
remaining nodes

now .. ! repeat your one-liner which worked just a moment 
ago and you should get exact same or similar errors(while 
all nodes are stuck on 'slave'

-> $ pcs resource debug-promote pgsqld
crm_resource: Error performing operation: Error occurred
Operation force-promote for pgsqld (ocf:heartbeat:pgsqlms) 
returned 1 (error: Can not get current node LSN location)
/tmp:5432 - accepting connections

ocf-exit-reason:Can not get current node LSN location
--------------------
You have to 'cib-push' to "fix" this very problem.
In my(admin's) opinion this is a 100% candidate for a bug - 
whether in PCS or PAF - perhaps authors may wish to comment?

many thanks, L.