[Pacemaker] OCF_RESKEY_CRM_meta_{ordered,notify,interleave}

Mon Apr 2 06:05:19 EDT 2012

On Mon, Apr 2, 2012 at 11:54 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Fri, Mar 30, 2012 at 7:34 PM, Florian Haas <florian at hastexo.com> wrote:
>> On Fri, Mar 30, 2012 at 1:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>> Because it was felt that RAs shouldn't need to know.
>>> Those options change pacemaker's behaviour, not the RAs.
>>>
>>> But subsequently, in lf#2391, you convinced us to add notify since it
>>> allowed the drbd agent to error out if they were not turned on.
>>
>> Yes, and for ordered the motivation is exactly the same. Let me give a
>> bit of background info.
>>
>> I'm currently working on an RA for GlusterFS volumes (the server-side
>> stuff, everything client side is already covered in
>> ocf:heartbeat:Filesystem). GlusterFS volumes are composed of "bricks",
>> and for every brick there's a separate process to be managed on each
>> cluster node. When these brick processes fail, GlusterFS has no
>> built-in way to recover, and that's where Pacemaker can be helpful.
>>
>> Obviously, you would run that RA as a clone, on however many nodes
>> constitute your GlusterFS storage cluster.
>>
>> Now, while brick daemons can be _monitored_ individually, they can
>> only be _started_ as part of the volume, with the "gluster volume
>> start" command. And if we "start" a volume simultaneously on multiple
>> nodes, GlusterFS just produces an error on all but one of them, and
>> that error is also a generic one and not discernible from other errors
>> by exit code (yes, you may rant).
>>
>> So, whenever we need to start >1 clone instance, we run into this problem:
>>
>> 1. Check whether brick is already running.
>> 2. No, it's not. Start volume (this leaves other bricks untouched, but
>> fires up the brick daemons expected to run locally).
>> 3. Grumble. A different node just did the same thing.
>> 4. All but one fail on start.
>>
>> Yes, all this isn't necessarily wonderful design (the start volume
>> command could block until volume operations have completed on other
>> servers, or it could error out with a "try again" error, or it could
>> sleep randomly before retrying, or something else), but as it happens
>> configuring the clone as ordered makes all of this evaporate.
>>
>> And it simply would be nice to be able to check whether clone ordering
>> is enabled, during validate.
>>
>>> I'd need more information.  The RA shouldn't need to care I would have
>>> thought. The ordering happens in the PE/crmd, the RA should just do
>>> what its told.
>>
>> Quite frankly, I don't quite get this segregation of "meta attributes
>> we expect to be relevant to the RA"
>
> The number of which is supposed to be zero.
> I'm not sure "cutting down on questions to the mailing list" is a good
> enough reason for adding additional exceptions.

Well, but you did read the technical reason I presented here?

> The one truly valid exception in my mind is globally-unique, since the
> monitor operation has to work quite differently.

Why are we not supposed to check for things like notify, ordered, allow-migrate?

> My concern with providing them all to RAs is that someone will
> probably start abusing them.

_Everything_ about an RA can be abused. Why is that any concern of
yours? You can't possibly enforce, from Pacemaker, that an RA actually
does what it's supposed to do.

Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now