[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Lars Ellenberg lars.ellenberg at linbit.com
Tue Sep 26 16:10:05 EDT 2017


On Mon, May 22, 2017 at 12:26:36PM -0500, Ken Gaillot wrote:
> Resurrecting an old thread, because I stumbled on something relevant ...

/me too :-)

> There had been some discussion about having the ability to run a more
> useful monitor operation on an otherwise systemd-based resource. We had
> talked about a couple approaches with advantages and disadvantages.
> 
> I had completely forgotten about an older capability of pacemaker that
> could be repurposed here: the (undocumented) "container" meta-attribute.

Which is nice to know.

The wrapper approach is appealing as well, though.

I have just implemented a PoC ocf:pacemaker:systemd "wrapper" RA,
to give my brain something different to do for a change.

Takes two parameters,
unit=(systemd unit), and
monitor_hook=(some executable)

The monitor_hook has access to the environment, obviously,
in case it needs that.  For monitor, it will only be called,
if "systemctl is-active" thinks the thing is active.

It is expected to return 0 (OCF_SUCCESS) for "running",
and 7 (OCF_NOT_RUNNING) for "not running".  It can return anything else,
all exit codes are directly propagated for the "monitor" action.
"Unexpected" exit codes will be logged with ocf_exit_reason
(does that make sense?).

systemctl start and stop commands apparently are "synchronous"
(have always been? only nowadays? is that relevant?)
but to be so, they need properly written unit files.
If there is an ExecStop command defined which will only trigger
stopping, but not wait for it, systemd cannot wait, either
(it has no way to know what it should wait for in that case),
and no-one should blame systemd for that.

That's why you would need to fix such systemd units,
but that's also why I added the additional _monitor loops
after systemctl start / stop.

Maybe it should not be named systemd, but systemd-wrapper.

Other comments?

    Lars


So here is my RFC,
tested only "manually" via

for x in monitor stop monitor start monitor ; do
  for try in 1 2; do
    OCF_ROOT=/usr/lib/ocf \
    OCF_RESKEY_monitor_hook=/usr/local/bin/my-monitoring-hook \
    OCF_RESKEY_unit=postfix at - ./systemd $x ; echo $try. $x $?
  done
done

------ /usr/local/bin/my-monitoring-hook ----------------------------
#!/bin/sh
echo quit | nc 127.0.0.1 25  2>/dev/null | grep -q ^220 || exit 7

----- /usr/lib/ocf/resource.d/pacemaker/systemd ---------------------
#!/bin/bash

: ${OCF_FUNCTIONS=${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs}
. ${OCF_FUNCTIONS}
: ${__OCF_ACTION=$1}


meta_data() {
	cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="systemd" version="1.0">
<version>1.0</version>

<longdesc lang="en">
This Resource Agent delegates start and stop to systemctl start and stop,
but monitor will in addition to systemctl status also run the monitor_hook you specify.
</longdesc>
<shortdesc lang="en">systemd service with monitor hook</shortdesc>

<parameters>
<parameter name="unit" unique="0">
<longdesc lang="en">
What systemd unit to manage.
</longdesc>
<shortdesc lang="en">systemd unit</shortdesc>
<content type="string" />
</parameter>

<parameters>
<parameter name="monitor_hook" unique="0">
<longdesc lang="en">
What executable to run in addition to systemctl status.
</longdesc>
<shortdesc lang="en">monitor hook</shortdesc>
<content type="string" />
</parameter>
</parameters>

<actions>
<action name="start"        timeout="20" />
<action name="stop"         timeout="20" />
<action name="monitor"      timeout="20" interval="10" depth="0"/>
<!--
<action name="reload"       timeout="20" />
-->
<action name="validate-all" timeout="20" />
<action name="meta-data"    timeout="5" />
</actions>
</resource-agent>
END
}

_monitor()
{
	local ex check
	if [[ -n "$OCF_RESKEY_monitor_hook" ]] &&
	   [[ -x "$OCF_RESKEY_monitor_hook" ]]; then
		"$OCF_RESKEY_monitor_hook"
		ex=$?
		: ==== $__OCF_ACTION/$ex ====
		case $__OCF_ACTION/$ex in
		stop/7) : "not running after stop: expected" ;;
		stop/*) ocf_exit_reason "returned exit code $ex after stop: $OCF_RESKEY_monitor_hook" ;;
		start/0) : "running after start: expected";;
		start/*) ocf_exit_reason "returned exit code $ex after start: $OCF_RESKEY_monitor_hook" ;;
		monitor/0|monitor/7) : "expected running (0) or not running (7)" ;;
		monitor/*)
			ocf_exit_reason "returned exit code $ex during monitor: $OCF_RESKEY_monitor_hook" ;;
		esac
		return $ex
	else
		ocf_exit_reason "missing or not executable: $OCF_RESKEY_monitor_hook"
	fi
	return $OCF_ERR_GENERIC
}

case $__OCF_ACTION in
meta-data) meta_data ;;
validate-all) : "Tbd. Maybe." ;;
stop)	systemctl stop $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
	# TODO make time/retries of monitor after stop configurable
	while _monitor; do sleep 1; done
	exit $OCF_SUCCESS
	;;
start)	systemctl start $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
	# TODO make time/retries of monitor after start configurable
	while ! _monitor; do sleep 1; done
	exit $OCF_SUCCESS
	;;
monitor)
	systemctl is-active --quiet $OCF_RESKEY_unit || exit $OCF_NOT_RUNNING
	_monitor
	;;
*)
	ocf_exit_reason "not implemented: $__OCF_ACTION"
	exit $OCF_ERR_GENERIC
esac

exit $?




More information about the Developers mailing list