[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Wed Apr 28 13:19:14 EDT 2021


On Wed, 28 Apr 2021 12:00:40 -0500
Ken Gaillot <kgaillot at redhat.com> wrote:

> On Wed, 2021-04-28 at 18:14 +0200, Jehan-Guillaume de Rorthais wrote:
> > Hi all,
> > 
> > It seems to me the concern raised by Ulrich hasn't been discussed:
> > 
> > On Wed, 12 Apr 2021 Ulrich Windl wrote:
> >   
> > > Personally I think an RA calling crm_mon is inherently broken: Will
> > > it ever
> > > pass ocf-tester?  
> 
> Calling the command-line tools in an agent can be OK in some cases. The
> main concerns are:
> 
> * Time-of-check/time-of-use: cluster status can change immediately, so
> the agent should behave reasonably if a query result is incorrect at
> the moment it's used. Ideally there would be no case where the agent
> could incorrectly report success for an action.
> 
> * No commands that *change* the configuration (other than setting node
> attributes) should ever be used. Otherwise there's a potential for an
> infinite loop between the agent and scheduler.
> 
> * It's best to use tools' XML output when available, because that
> should be stable across Pacemaker releases, while the text output may
> not be. Aside from crm_mon, XML output is a recent addition, so some
> consideration must be given to backward compatibility and/or requiring
> a minimum Pacemaker version.
> 
> * Only the configuration section of the CIB has a guaranteed schema.
> The status section can theoretically change from release to release,
> although in practice it has changed very little over the years.
> 
> I don't use ocf-tester so I can't speak to that, but I suspect it could
> work if you exported a CIB_file variable with a sample cluster status
> beforehand. (CIB_file makes the cluster commands act as if the
> specified file is the live CIB at the moment.)
> 
> > Would it be possible to rely on the following command ?
> > 
> >   cibadmin --query --xpath "//status/node_state[@join='member']" | \
> >     grep -Po 'uname="\K[^"]+'
> > 
> > 
> > Regards,  
> 
> Only full cluster nodes will have a "join" attribute, so that query
> won't catch active remote nodes or guest nodes. Whether that's good or
> bad depends on what you're looking for.

That was an example to remove the crm_mon dependency with the cibadmin one.
AFAIU this agent, it uses crm_mon to:

* look for the node hosting the promoted clone
* look for a node existence
* look for a node fully joined

all of these use seems accessible by parsing the cibadmin status section
output (or --xpath).

> The plus side is that it's a query and it returns XML.

indeed.

> The downsides are that node status can change quickly, so it could
> theoretically be inaccurate a moment later when you use it, and the
> status section is not guaranteed to stay in that format (though I
> expect that particular part will).

There's already version checks in pgsql RA code for crm_mon anyway, relying on
OCF_RESKEY_crm_feature_set.

> A minor point: that query will return the entire node_state XML
> subtree; you can add -n/--no-children to return just the node_state
> element itself.

Nice!

I was playing with xmllint as well, for an expanded support of xmllint, but it
would add a strong dependency.

Regards,


More information about the Users mailing list