[ClusterLabs Developers] Resource Agent language discussion

Fri Aug 7 06:09:10 EDT 2015

Hi guys,

So we wrote a postgresql resource agent, with a simpler code (it only supports
multi-state setup on purpose), simpler to manage and understand, simpler to
configure and with a different failover algorithm than the existing one. 

After struggling with the recover action when the master is getting down and
can be restarted (see the other thread "CRM trying to demote a stopped resource"
on clusterlabs-dev@), we now have a first two node cluster capable of recovering
or falling over a master to its slave.

We still have lot of tests to do with cluster using 2+ slaves to validate the
promotion algorithm, but at least things are moving on in the good direction
now.

Now, I would like to discuss about the language used to write a RA in Pacemaker.
I never seen discussion or page about this so far. HINT: I don't want to discuss
(neither troll about) what is the best language. I would like to know why
**ALL** the RA are written in bash and if there's traps (hidden far in
ocf-shellfuncs as instance) to avoid if using a different language. And is it
acceptable to include new libs for other languages?

We rewrote the RA in perl, mostly because of me. I was bored with bash/sh
limitations AND syntax AND useless code complexity for some easy tasks AND traps
(return code etc). In my opinion, bash/sh are fine if you RA code is short
and simple. Which was mostly the case back in the time of heartbeat which was
stateless only. But it became a nightmare with multi-state agents struggling
with complexe code to fit with Pacemaker behavior. Have a look to the mysql or
pgsql agents.

Moreover, with bash, I had some weird behaviors (timeouts) from the RA between
runuser/su/sudo and systemd/pamd some months ago. The three of them have system
implications or side effects deep in the system you need to take care off. Using
a language able to seteuid/setuid after forking is much more natural and clean
to drop root privileges and start the daemon (PostgreSQL refuses to start as
root and is not able to drop its privileges to another system user itself).

Now, we are far to have a enterprise class certified code, our RA had its
very first tests passed successfully yesterday, but here is a quick feedback.
The downside of picking another language than bash/sh is that there is no
OCF module/library available for them. This is quite inconvenient when you need
to get system specifics variables or logging shortcut only defined in
ocf-shellfuncs (and I would guess patched by packagers ?).

As instance, I had to "capture" values of $HA_SBIN_DIR and $HA_RSCTMP from my
perl code. Another exemple, our perl RA is only logging to syslog presently. We
will probably have to rewrite the ocf_log/ha_log/ha_debug in perl before
publishing the final code. Any tip about this ?

At some point, to have a clean, portable and OS agnostic code, I wonder how
much code we will have to port from ocf-shellfuncs to perl...

By the way, you'll find the code there: 

  https://github.com/dalibo/pgsql-resource-agent/blob/ms_perl/pgsqlms     

It still need a proper README, taking care of the FIXME and some housecleaning,
but any feedback would be appreciated as well.

Regards,