[ClusterLabs] service flap as nodes join and leave

Ken Gaillot kgaillot at redhat.com
Thu Apr 14 15:12:54 UTC 2016


On 04/14/2016 09:33 AM, Christopher Harvey wrote:
> MsgBB-Active is a dummy resource that simply returns OCF_SUCCESS on
> every operation and logs to a file.

That's a common mistake, and will confuse the cluster. The cluster
checks the status of resources both where they're supposed to be running
and where they're not. If status always returns success, the cluster
won't try to start it where it should,, and will continuously try to
stop it elsewhere, because it thinks it's already running everywhere.

It's essential that an RA distinguish between running
(OCF_SUCCESS/OCF_RUNNING_MASTER), cleanly not running (OCF_NOT_RUNNING),
and unknown/failed (OCF_ERR_*/OCF_FAILED_MASTER).

See pacemaker's Dummy agent as an example/template:

https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/Dummy

It touches a temporary file to know whether it is "running" or not.

ocf-shellfuncs has a ha_pseudo_resource() function that does the same
thing. See the ocf:heartbeat:Delay agent for example usage.




More information about the Users mailing list