[ClusterLabs] service flap as nodes join and leave
aspiers at suse.com
Thu Apr 14 11:35:24 EDT 2016
Ken Gaillot <kgaillot at redhat.com> wrote:
> On 04/14/2016 09:33 AM, Christopher Harvey wrote:
> > MsgBB-Active is a dummy resource that simply returns OCF_SUCCESS on
> > every operation and logs to a file.
> That's a common mistake, and will confuse the cluster. The cluster
> checks the status of resources both where they're supposed to be running
> and where they're not. If status always returns success, the cluster
> won't try to start it where it should,, and will continuously try to
> stop it elsewhere, because it thinks it's already running everywhere.
> It's essential that an RA distinguish between running
> (OCF_SUCCESS/OCF_RUNNING_MASTER), cleanly not running (OCF_NOT_RUNNING),
> and unknown/failed (OCF_ERR_*/OCF_FAILED_MASTER).
> See pacemaker's Dummy agent as an example/template:
> It touches a temporary file to know whether it is "running" or not.
Yes, I very recently discovered we had made a similar mistake which
was confusing Pacemaker into thinking a pseudo-resource was running
everywhere, whereas we actually only wanted it running active/passive.
This was the fix:
> ocf-shellfuncs has a ha_pseudo_resource() function that does the same
> thing. See the ocf:heartbeat:Delay agent for example usage.
Interesting thanks, I didn't know that.
More information about the Users