[ClusterLabs] Q: monitor and probe result codes and consequences

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu May 12 03:56:47 EDT 2016


Hi!

I have a question regarding an RA written by myself and pacemaker 1.1.12-f47ea56 (SLES11 SP4):

During "probe" all resources' "monitor" actions are executed (regardless of any ordering constraints). Therefore my RA considers a parameter as invalid ("file does not exist") (the file will be provided once some supplying resource is up) and returns rc=2.
OK, this may not be optimal, but pacemaker makes it worse: It does not repeat the probe once the resource would start, but keeps the state, preventing a resource start:

 primitive_monitor_0 on h05 'invalid parameter' (2): call=73, status=complete, exit-reason='none', last-rc-change='Wed May 11 17:03:39 2016', queued=0ms, exec=82ms

So you would say that monitor may only return "success" or "not running", but I feel the RA should detect the condition that the resource could not run at all at the present state.

Shouldn't pacemaker reprobe resources before it tries to start them?

Before my RA had passed all the ocf-tester checks, so this situation is hard to test (unless you have a test cluster you can restart any time).

(After manual resource cleanup the resource started as usual)

My monitor uses the following logic:
---
    monitor|status)
        if validate; then
            set_variables
            check_resource || exit $OCF_NOT_RUNNING
            status=$OCF_SUCCESS
        else # cannot check status with invalid parameters
            status=$?
        fi
        exit $status
        ;;
---

Should I mess with ocf_is_probe?

Regards,
Ulrich






More information about the Users mailing list