[Pacemaker] OCF_RESKEY_CRM_meta_{ordered,notify,interleave}

Mon Apr 2 06:32:18 EDT 2012

On Mon, Apr 2, 2012 at 8:05 PM, Florian Haas <florian at hastexo.com> wrote:
> On Mon, Apr 2, 2012 at 11:54 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Fri, Mar 30, 2012 at 7:34 PM, Florian Haas <florian at hastexo.com> wrote:
>>> On Fri, Mar 30, 2012 at 1:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>> Because it was felt that RAs shouldn't need to know.
>>>> Those options change pacemaker's behaviour, not the RAs.
>>>>
>>>> But subsequently, in lf#2391, you convinced us to add notify since it
>>>> allowed the drbd agent to error out if they were not turned on.
>>>
>>> Yes, and for ordered the motivation is exactly the same. Let me give a
>>> bit of background info.
>>>
>>> I'm currently working on an RA for GlusterFS volumes (the server-side
>>> stuff, everything client side is already covered in
>>> ocf:heartbeat:Filesystem). GlusterFS volumes are composed of "bricks",
>>> and for every brick there's a separate process to be managed on each
>>> cluster node. When these brick processes fail, GlusterFS has no
>>> built-in way to recover, and that's where Pacemaker can be helpful.
>>>
>>> Obviously, you would run that RA as a clone, on however many nodes
>>> constitute your GlusterFS storage cluster.
>>>
>>> Now, while brick daemons can be _monitored_ individually, they can
>>> only be _started_ as part of the volume, with the "gluster volume
>>> start" command. And if we "start" a volume simultaneously on multiple
>>> nodes, GlusterFS just produces an error on all but one of them, and
>>> that error is also a generic one and not discernible from other errors
>>> by exit code (yes, you may rant).
>>>
>>> So, whenever we need to start >1 clone instance, we run into this problem:
>>>
>>> 1. Check whether brick is already running.
>>> 2. No, it's not. Start volume (this leaves other bricks untouched, but
>>> fires up the brick daemons expected to run locally).
>>> 3. Grumble. A different node just did the same thing.
>>> 4. All but one fail on start.
>>>
>>> Yes, all this isn't necessarily wonderful design (the start volume
>>> command could block until volume operations have completed on other
>>> servers, or it could error out with a "try again" error, or it could
>>> sleep randomly before retrying, or something else), but as it happens
>>> configuring the clone as ordered makes all of this evaporate.
>>>
>>> And it simply would be nice to be able to check whether clone ordering
>>> is enabled, during validate.
>>>
>>>> I'd need more information.  The RA shouldn't need to care I would have
>>>> thought. The ordering happens in the PE/crmd, the RA should just do
>>>> what its told.
>>>
>>> Quite frankly, I don't quite get this segregation of "meta attributes
>>> we expect to be relevant to the RA"
>>
>> The number of which is supposed to be zero.
>> I'm not sure "cutting down on questions to the mailing list" is a good
>> enough reason for adding additional exceptions.
>
> Well, but you did read the technical reason I presented here?

Yes, and it boiled down to "don't let the user hang themselves".
Which is a noble goal, I just don't like the way we're achieving it.

Why not advertise the requirements in the metadata somehow?

>
>> The one truly valid exception in my mind is globally-unique, since the
>> monitor operation has to work quite differently.
>
> Why are we not supposed to check for things like notify, ordered, allow-migrate?
>
>> My concern with providing them all to RAs is that someone will
>> probably start abusing them.
>
> _Everything_ about an RA can be abused. Why is that any concern of
> yours? You can't possibly enforce, from Pacemaker, that an RA actually
> does what it's supposed to do.

No, but I can take away the extra ammo :)

>
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org