[ClusterLabs] multiple resources - pgsqlms - and IP(s)

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Fri Jan 6 18:26:16 EST 2023


On Wed, 4 Jan 2023 11:15:06 +0100
Tomas Jelinek <tojeline at redhat.com> wrote:

> Dne 04. 01. 23 v 8:29 Reid Wahl napsal(a):
> > On Tue, Jan 3, 2023 at 10:53 PM lejeczek via Users
> > <users at clusterlabs.org> wrote:  
> >>
> >>
> >>
> >> On 03/01/2023 21:44, Ken Gaillot wrote:  
> >>> On Tue, 2023-01-03 at 18:18 +0100, lejeczek via Users wrote:  
> >>>> On 03/01/2023 17:03, Jehan-Guillaume de Rorthais wrote:  
> >>>>> Hi,
> >>>>>
> >>>>> On Tue, 3 Jan 2023 16:44:01 +0100
> >>>>> lejeczek via Users <users at clusterlabs.org> wrote:
> >>>>>  
> >>>>>> To get/have Postgresql cluster with 'pgsqlms' resource, such
> >>>>>> cluster needs a 'master' IP - what do you guys do when/if
> >>>>>> you have multiple resources off this agent?
> >>>>>> I wonder if it is possible to keep just one IP and have all
> >>>>>> those resources go to it - probably 'scoring' would be very
> >>>>>> tricky then, or perhaps not?  
> >>>>> That would mean all promoted pgsql MUST be on the same node at any
> >>>>> time.
> >>>>> If one of your instance got some troubles and need to failover,
> >>>>> *ALL* of them
> >>>>> would failover.
> >>>>>
> >>>>> This imply not just a small failure time window for one instance,
> >>>>> but for all
> >>>>> of them, all the users.
> >>>>>  
> >>>>>> Or you do separate IP for each 'pgsqlms' resource - the
> >>>>>> easiest way out?  
> >>>>> That looks like a better option to me, yes.
> >>>>>
> >>>>> Regards,  
> >>>> Not related - Is this an old bug?:
> >>>>  
> >>>> -> $ pcs resource create pgsqld-apps ocf:heartbeat:pgsqlms  
> >>>> bindir=/usr/bin pgdata=/apps/pgsql/data op start timeout=60s
> >>>> op stop timeout=60s op promote timeout=30s op demote
> >>>> timeout=120s op monitor interval=15s timeout=10s
> >>>> role="Master" op monitor interval=16s timeout=10s
> >>>> role="Slave" op notify timeout=60s meta promotable=true
> >>>> notify=true master-max=1 --disable
> >>>> Error: Validation result from agent (use --force to override):
> >>>>      ocf-exit-reason:You must set meta parameter notify=true
> >>>> for your master resource
> >>>> Error: Errors have occurred, therefore pcs is unable to continue  
> >>> pcs now runs an agent's validate-all action before creating a resource.
> >>> In this case it's detecting a real issue in your command. The options
> >>> you have after "meta" are clone options, not meta options of the
> >>> resource being cloned. If you just change "meta" to "clone" it should
> >>> work.  
> >> Nope. Exact same error message.
> >> If I remember correctly there was a bug specifically
> >> pertained to 'notify=true'  
> > 
> > The only recent one I can remember was a core dump.
> > - Bug 2039675 - pacemaker coredump with ocf:heartbeat:mysql resource
> > (https://bugzilla.redhat.com/show_bug.cgi?id=2039675)
> > 
> >  From a quick inspection of the pcs resource validation code
> > (lib/pacemaker/live.py:validate_resource_instance_attributes_via_pcmk()),
> > it doesn't look like it passes the meta attributes. It only passes the
> > instance attributes. (I could be mistaken.)
> > 
> > The pgsqlms resource agent checks the notify meta attribute's value as
> > part of the validate-all action. If pcs doesn't pass the meta
> > attributes to crm_resource, then the check will fail.
> >   
> 
> Pcs cannot pass meta attributes to crm_resource, because there is 
> nowhere to pass them to.

But, they are passed as environment variable by Pacemaker, why pcs couldn't set
them as well when running the agent?

> As defined in OCF 1.1, only instance attributes 
> matter for validation, see 
> https://github.com/ClusterLabs/OCF-spec/blob/main/ra/1.1/resource-agent-api.md#check-levels

It doesn't state clearly that meta attributes must be ignored by the agent
during these actions.

And one could argue checking a meta attribute is a purely internal setup check,
at level 0.

> The agents are bugged - they depend on meta data being passed to 
> validation. This is already tracked and being worked on:
> 
> https://github.com/ClusterLabs/resource-agents/pull/1826

The pgsqlms resource agent checks the OCF_RESKEY_CRM_meta_notify environment
variable before raising this error.

The pgsqlms resource agent is relying on notify action to make some important
checks and actions. Without notifies, the resource will just behave wrongly.
This is an essential check.

However, I've been considering moving some of these checks only during the
probe action. Would it make sense? The notify check could move there as there's
no need to check it on a regular basis.

Thanks,


More information about the Users mailing list