[ClusterLabs] PAF with postgresql 13?

Tomas Jelinek tojeline at redhat.com
Wed Mar 9 11:55:05 EST 2022


Dne 08. 03. 22 v 23:08 Ken Gaillot napsal(a):
> On Tue, 2022-03-08 at 17:20 +0100, Jehan-Guillaume de Rorthais wrote:
>> Hi,
>>
>> Sorry, your mail was really hard to read on my side, but I think I
>> understood
>> and try to answer bellow.
>>
>> On Tue, 8 Mar 2022 11:45:30 +0000
>> lejeczek via Users <users at clusterlabs.org> wrote:
>>
>>> On 08/03/2022 10:21, Jehan-Guillaume de Rorthais wrote:
>>>>> op start timeout=60s \ op stop timeout=60s \ op promote
>>>>> timeout=30s  >> \
>>>>> op demote timeout=120s \ op monitor interval=15s
>>> timeout=10s >> role="Master" meta master-max=1 \ op monitor
>>> interval=16s >> timeout=10s role="Slave" \ op notify
>>> timeout=60s meta notify=true > Because "op" appears, we are
>>> back in resource ("pgsqld") context, > anything after is
>>> interpreted as ressource and operation attributes, > even
>>> the "meta notify=true". That's why your pgsqld-clone doesn't
>>>   > have the meta attribute "notify=true" set.
>>> Here is one-liner that should do - add, as per 'debug-'
>>> suggestion, 'master-max=1'
>>
>> What debug- suggestion??
>>
>> ...
>>> then do:
>>>
>>> -> $ pcs resource delete pgsqld
>>>
>>> '-clone' should get removed too, so now no 'pgsqld'
>>> resource(s) but cluster - weirdly in my mind - leaves node
>>> attributes on.
>>
>> indeed.
>>
>>> I see 'master-pgsqld' with each node and do not see why
>>> 'node attributes' should be kept(certainly shown) for
>>> non-existent resources(to which only resources those attrs
>>> are instinct)
>>> So, you want to "clean" that for, perhaps for now you are
>>> not going to have/use 'pgsqlms', you can do that with:
>>>
>>> -> $ pcs node attribute node1 master-pgsqld="" # same for
>>> remaining nodes
>>
>> indeed.
>>
>>> now .. ! repeat your one-liner which worked just a moment
>>> ago and you should get exact same or similar errors(while
>>> all nodes are stuck on 'slave'
>>
>> You have no promotion because your PostgreSQL instances has been
>> stopped
>> in standby mode. The cluster has no way and no score to promote one
>> of them.
>>
>>> -> $ pcs resource debug-promote pgsqld
>>> crm_resource: Error performing operation: Error occurred
>>> Operation force-promote for pgsqld (ocf:heartbeat:pgsqlms)
>>> returned 1 (error: Can not get current node LSN location)
>>> /tmp:5432 - accepting connections
>>
>> NEVER use "debug-promote" or other "debug-*" command with pgsqlms, or
>> any other
>> cloned ressources. AFAIK, these commands works fine for "stateless"
>> ressource,
>> but do not (could not) create the required environnement for the
>> clone and multi-state ones.
>>
>> So I repeat, NEVER use "debug-promote".
>>
>> What you want to do is setting the promotion score on the node you
>> want the
>> promotion to happen. Eg.:
>>
>>    pcs node attribute srv1 master-pgsqld=1001
>>
>> You can use "crm_attribute" or "crm_master" as well.
>>
>>> ocf-exit-reason:Can not get current node LSN location
>>
>> This one is probably because of "debug-promote".
>>
>>> You have to 'cib-push' to "fix" this very problem.
>>> In my(admin's) opinion this is a 100% candidate for a bug -
>>> whether in PCS or PAF - perhaps authors may wish to comment?
>>
>> Removing the node attributes with the resource might be legit from
>> the
>> Pacemaker point of view, but I'm not sure how they can track the
>> dependency
>> (ping Ken?).
> 
> Higher-level tools like pcs or crm shell could probably do it when
> removing the resource (i.e. if the resource was a promotable clone,
> check for and remove any node attributes of the form master-$RSC_ID).
> That sounds like a good idea to me.

I put this on pcs todo list.

Regards,
Tomas

> 
> Pacemaker would be a bad place to do it because Pacemaker only sees the
> newly modified CIB with the resource configuration gone -- it can't
> know for sure whether it was a promotable clone, and it can only know
> it existed at all if there is leftover status entries (causing the
> resource to be listed as "orphaned"), which isn't guaranteed.
> 
>>
>> PAF has no way to know the ressource is being deleted and can not
>> remove its
>> node attribute before hand.
>>
>> Maybe PCS can look for promotable score and remove them during the
>> "resource
>> delete" command (ping Tomas)?
>>
>> Regards,
>>



More information about the Users mailing list