[ClusterLabs] PAF with postgresql 13?

Tue Mar 8 16:33:04 EST 2022

On Tue, 8 Mar 2022 17:44:36 +0000
lejeczek via Users <users at clusterlabs.org> wrote:

> On 08/03/2022 16:20, Jehan-Guillaume de Rorthais wrote:
> > Removing the node attributes with the resource might be legit from the
> > Pacemaker point of view, but I'm not sure how they can track the dependency
> > (ping Ken?).
> >
> > PAF has no way to know the ressource is being deleted and can not remove its
> > node attribute before hand.
> >
> > Maybe PCS can look for promotable score and remove them during the "resource
> > delete" command (ping Tomas)?  
> bit of catch-22, no?

No, why? What's the emergency? 

Can you lose some data? no.

Your service not able to be brought back quickly? no.

I hope I'm not missing something, but so far, it just looks like a
misunderstanding on how to correctly bring back the service.

Maybe there's something to improve, doc or code, but we first need to explain
what you see, what it means and what you should do. So, I'll try to explain
with some more gory details, sorry in advance.

> To those of us to whom it's first foray into all "this" it 
> might be - it was to me - I see "attributes" hanging on for 
> no reason, resource does not exist, then first thing I want 
> to do is a "cleanup" - natural - but this to result in 
> inability to re-create the resource(all remain slaves) at 
> later time with 'pcs' one-liner (no cib-push) is a: no no 
> should be not...

The "no no" is: «don't use debug-promote and other debug-* command, it doesn't
work with clones». 

At least it doesn't work with pgsqlms because these commands bypass the cluster
and doesn't set some _essential_ environment variables for clones that the
cluster usually set.

You CAN recreate your resource with your one-liner. But you just miss ONE
command to trigger a promotion. See bellow the explanation.

> ... because, at later time, why should not resource 
> re/creation with 'pcs resource create' bring back those 
> 'node attrs' ?

Because this is a different situation than when you first create the resource.
When you first created the resource, there was a primary and at least one
standby. On resource creation, pgsqlms detect the primary and set its promotion
score to 1 (not even 1001, 1000, ..., just 1). Then, all the magic happen from
this very small seed.

Note that we are able know if an instance is a primary or a secondary, even when
it is stopped, by reading one of its internal file.

When you destroy the pgsqld resource, Pacemaker stop all the pgsql instance,
that means: "demote -> stop" for the primary, and "stop" for the secondaries.
Now, they are _all_ secondaries. Then, you clean the related node attribute
which were designating the previous primary.

Now, on second creation, pgsqlms will only find secondaries and is not able
to "choose" one. Because there's no promotion score, the cluster is not able to
promote one either. Pacemaker is all about scores. No scores, no actions.
From there, the cluster requires some human wisdom to chose one to promote and
set it a score with the command I gave in my previous message. Eg.:

  pcs node attribute srv2 master-pgsqld=1

Try it. Set the promotion score to 1 for one node. You'll see the cluster
react and pgsqlms recompute all new scores really quickly from there.

I don't expect users to read the source, but this is explained in comments, for
devs memory, see:

https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L1500

From the user perspective, this is documented here:

https://clusterlabs.github.io/PAF/configuration.html

  «
  Last but not least, during the very first startup of your cluster, the
  designated primary will be the only instance stopped gently as a primary. Take
  great care of this when you setup your cluster for the first time.
  »

I suppose I should document how to set the promotion score command somewhere...
maybe in the cookbook docs? What do you think?

Now, let's discuss.  What to do when there's no promotable score around? 

One possibility would be to check if all the instances are stopped at the same
level of data history (the LSN) and promote one of them randomly... But I bet it
would be annoying if one of them is not at the same level than the others, with
users wondering why one is promoted sometime and no-one some other times.
Moreover, this would add some more complexity to the code, and complexity is
bad for high availability.

Note that for various reasons, having a stopped cluster with instance at
different point in data history is not OK from the pgsqlms automate point of
view.

The other possibility I am actually musing on since a long time is to _remove_
this primary detection code. This would force admin to pick one primary
explicitly by setting the score by hand on resource creation. This is just a
one-time command to add when you _create_ your resource and at least, the
procedure and cluster behavior would be always the same, no matter what you do.

What is your thought? Any other idea to discuss?

> Unless... 'pgsqlms' can only be done by 'cib-push' under 
> every circumstance - so, a) after manual node attr removal 
> b) clean cluster, very first no prior 'pgsqlms' resource 
> deployment.

Unless I'm missing something in your explanation, you really just don't need
cib-push. Neither to create the very first time, neither the next times. In
fact, you can do everything without it. If you use it to promote one node, this
is actually the wrong way to do it. Just use:

  pcs node attribute srv2 master-pgsqld=1

I did reproduced your scenario on my side. It just work with the above command.
Just, replace the node name with the one you want.

A little digression about cib-push, this command is just a good and smart
practice to me, for multiple reasons:

* auditable
* simulation with "crm_simulate"
* validation with "crm_verify" or "pcs cluster verify -f ..."
* all changes in one transition, constraints applied immediately
* history
* ...

I just love cib-push, but it doesn't mean YOU need to use it with pgsqlms.

> yes, it would be great to have this 
> 'improvement/enhancement' in future releases incorporated - 
> those node attr removed @resource removal time.

Then we need to wait for Ken and Tomas to chime in :)

Regards,

BONUS: Thank you for reading all this brain dump. Here is a tip: you can
actually trigger a pgsql switchover by setting a promotion score large enough on
the desired destination node, without ban/move constaints. Eg.: 

  pcs node attribute srv2 master-pgsqld=2000

Pacemaker is all about scores, use with caution.

BONUS 2: try to create your resource with your PostgreSQL cluster actually up
and replicating ;)