[ClusterLabs Developers] Resurrecting OCF

Jan Pokorný jpokorny at redhat.com
Thu Sep 22 05:28:33 EDT 2016

On 21/09/16 16:26 -0500, Ken Gaillot wrote:
> On 09/21/2016 10:55 AM, Jan Pokorný wrote:
>> On 21/09/16 14:50 +1000, Andrew Beekhof wrote:
>>> I like where this is going.
>>> Although I don’t think we want to get into the business of trying to
>>> script config changes from one agent to another, so I’d drop #4
>> Not agent parameter changes, just its specification -- to reflect
>> formally what the proposed symlink-based delegation scheme does when
>> the old one is still in use.  If the old and new are incompatible,
>> such automatic delegation is not possible anyway (that's one of
>> the reasons "description" would come handy).
>> I see there's much bigger potential (parameter renames, ...) but for
>> that, each agent should be responsible on its own (somehow, subject
>> of further evolution).
>> Also, supposing there are more consumers of RA, the suggestion to
>> run the script should be more generic ("when used from under
>> pacemaker, ...").
>>> I would make .deprecated a nested directory so that if we want to
>>> retire (for example) a ClusterLabs agent in the future we can create
>>> .deprecate/clusterlabs/ and put the agent there. Rather than make
>>> this heartbeat specific.
>> Good point; it would also prevent clashes when single directory should
>> serve all the providers.
> I don't understand the desire to treat "deprecated" agents any
> differently. It should be sufficient to just mention it in their help
> text / man page / meta-data / other documentation. Pacemaker isn't going
> to run a "deprecated" agent any differently.

And I don't understand how you came into the conclusion there's
anything changed from the outer view, beside occasional note about
deprecation being emitted to the logs.

It'd be an implementation detail self-contained in resource-agents.

> When users see ocf:whatever:whatever, they know where to look for the
> script. Why frustrate them by making them waste time figuring out how a
> "nonexistent" RA is being used and finding it?

Symlink makes a clear connection and beside, I proposed "new-alias"
action.  I think you overestimate how often the agents are physically
investigated (I guess the project would have more committers if it
was the case).

> If the goal is to let users know that an agent is deprecated (which is
> the only reason that I can think of), then we can add an attribute in
> the meta-data, and UIs/pacemaker can report/log it if present.
>   <resource-agent
>     name="Evmsd"
>     deprecated="No longer actively maintained"
>   >
>>> I wonder if some of this should live in pacemaker itself though…
>> This runs directly to the other side of the RA-pacemaker bias,
>> pacemaker caring about RA evolutionary internals :-)
>> In the outlook, that would make any separated OCF standard efforts
>> worthless and we could just call it pacemaker resource standard
>> right away and forget about any sort of self-containment
>> (the proposed procedure aims to align with).
>> I am not sure that would be the best thing.
> Agreed, anything we come up with should be explicit in the OCF standard.
> But I think this behavior could be specified in the standard.

As the standard provides guarantees for outer interfacing, there's no
utter need to externalize otherwise self-contained subtleties, in this
case beyond saying that symlinks to __formatted__ files should be
excluded from agent lists (might be overridden on demand).

>>> If resources_action_create() cannot find ocf:${provider}:${agent} in
>>> its usual location, look up
>>> ${OCF_ROOT_DIR}/.compat/${provider}/__entries__
>>> Format for __entries__:
>>>    # old, replacement
>>>    # ${agent} , ${new_provider}:${new_agent} , ${description}
>>>    IPaddr , clusterlabs:IP , Replaced with different semantics
>>>    IPaddr2 , clusterlabs:IP , Moved
>>>    drbd , linbit:drbd , Moved
>>>    eDirectory , , Deleted
>> Additional "what happened" field might work well in the update
>> suggestions.
>>> Assuming an entry is found:
>>> - If  . compat/${old_provider}/${old_agent} exists, notify the user
>>>    “somehow”, then call it.
>>> - Otherwise, return OCF_ERR_NOT_INSTALLED and use ${description} and
>>>   ${replacement} as the exit reason (which shows up in pcs status).
>>> Perhaps the “somehow” is creating PCMK_OCF_DEPRECATED (with the same
>>> semantics as PCMK_OCF_DEGRADED) and prepending ${description} to the
>>> output (assuming its not a metadata op) and/or the exit reason[1].
>>> Maybe only on successful start operations to minimise the noise?
>>> [1] Shouldn’t be too hard with some extra fields for 'struct
>>>     svc_action_private_s’ or svc_action_t
> I like the idea of intelligent aliasing. I'm hoping we can do it without
> a separate directory structure and meta-meta-data files.
> What about continuing to use symlinks from the old name to the new name,
> and adding an aliases section to the agent meta-data?
>   <resource-agent name="IP" version="1.5">
>     <version>2.0</version>
>     <aliases>
>       <alias name="ocf:heartbeat:IPaddr"
>         reason="Replaced with different semantics">
>       <alias name="ocf:heartbeat:IPaddr2"
>         reason="Superseded by clusterlabs provider">
>     </aliases>
>   </resource-agent>
> The file would be where people expect to find it, and the intent would
> be readable.
> If pacemaker loads an RA's metadata and finds the configured name in the
> aliases section, it could log a warning with the reason.
> The only drawback I see is that there is no inherent coordination
> between the symlinks and the aliases. But either would be fine without
> the other, so I don't see that as serious.
>>>> On 19 Aug 2016, at 6:59 PM, Jan Pokorný <jpokorny at redhat.com> wrote:
>>>> On 18/08/16 17:27 +0200, Klaus Wenninger wrote:
>>>>> On 08/18/2016 05:16 PM, Ken Gaillot wrote:
>>>>>> On 08/18/2016 08:31 AM, Kristoffer Grönlund wrote:
>>>>>>> Jan Pokorný <jpokorny at redhat.com> writes:
>>>>>>>> Thinking about that, ClusterLabs may be considered a brand established
>>>>>>>> well enough for "clusterlabs" provider to work better than anything
>>>>>>>> general such as previously proposed "core".  Also, it's not expected
>>>>>>>> there will be more RA-centered projects under this umbrella than
>>>>>>>> resource-agents (pacemaker deserves to be a provider on its own),
>>>>>>>> so it would be pretty unambiguous pointer.
>>>>>>> I like this suggestion as well.
>>>>>> Sounds good to me.
>>>>>>>> And for new, not well-tested agents within resource-agents, there could
>>>>>>>> also be a provider schema akin to "clusterlabs-staging" introduced.
>>>>>>>> 1 CZK
>>>>>>> ...and this too.
>>>>>> I'd rather not see this. If the RA gets promoted to "well-tested",
>>>>>> everyone's configuration has to change. And there's never a clear line
>>>>>> between "not well-tested" and "well-tested", so things wind up staying
>>>>>> in "beta" status long after they're widely used in production, which
>>>>>> unnecessarily makes people question their reliability.
>>>>>> If an RA is considered experimental, say so in the documentation
>>>>>> (including the man page and help text), and give it an "0.x" version number.
>>>>>>> Here is another one: While we are moving agents into a new namespace,
>>>>>>> perhaps it is time to clean up some of the legacy agents that are no
>>>>>>> longer recommended or of questionable quality? Off the top of my head,
>>>>>>> there are
>>>>>>> * heartbeat/Evmsd
>>>>>>> * heartbeat/EvmsSCC
>>>>>>> * heartbeat/LinuxSCSI
>>>>>>> * heartbeat/pingd
>>>>>>> * heartbeat/IPaddr
>>>>>>> * heartbeat/ManageRAID
>>>>>>> * heartbeat/vmware
>>>>>>> A pet peeve of mine would also be to move heartbeat/IPaddr2 to
>>>>>>> clusterlabs/IP, to finally get rid of that weird 2 in the name...
>>>>>> +1!!! (or is it -2?)
>>>>>>> Cheers,
>>>>>>> Kristoffer
>>>>>> Obviously, we need to keep the ocf:heartbeat provider around for
>>>>>> backward compatibility, for the extensive existing uses both in cluster
>>>>>> configurations and in the zillions of how-to's scattered around the web.
>>>>>> Also, despite the recommendation of creating your own provider, many
>>>>>> people drop custom RAs in the heartbeat directory.
>>>>>> The simplest approach would be to just symlink heartbeat to clusterlabs,
>>>>>> but I think that's a bad idea. If a custom RA deployment or some package
>>>>>> other than resource-agents puts an RA there, resource-agents will try to
>>>>>> make it a symlink and the other package will try to make it a directory.
>>>>>> Plus, people may have configuration management systems and/or file
>>>>>> integrity systems that need it to be a directory.
>>>>>> So, I'd recommend we keep the heartbeat directory, and keep the old RAs
>>>>>> you list above in it, move the rest of the RAs to the new clusterlabs
>>>>>> directory, and symlink each one back to the heartbeat directory. At the
>>>>>> same time, we can announce the heartbeat provider as deprecated, and
>>>>>> after a very long time (when it's difficult to find references to it via
>>>>>> google), we can drop it.
>>>>> Maybe a way to go for the staging-RAs as well:
>>>>> Have them in clusterlabs-staging and symlinked (during install
>>>>> or package-generation) into clusterlabs ... while they are
>>>>> cleanly separated in the source-tree.
>>>> So, having some more thoughts on this, here's the possible action
>>>> plan (just for heartbeat -> clusterlabs transition + deprecating
>>>> some agents, but clusterlabs-staging -> clusterlabs would be similar):
>>>> # (adapt and) move original heartbeat agents
>>>> 1. have a resource.d subdirectory "clusterlabs" and move (possibly under
>>>>   new names) agents that were a priori updated to reflect new revision
>>>>   of OCF there
>>>> 2. have a resource.d subdirectory ".deprecated" (for instance) and
>>>>   move the RAs that are going to be sunset over there (i.e.,
>>>>   original heartbeat agents = agents moved to clusterlabs + agents
>>>>   moved to .deprecated + agents that remained under heartbeat, pending
>>>>   to be moved under cluster labs)
>>>> # preparation for backward compatibility
>>>> 3. have a file with old heartbeat name -> new clusterlabs name mapping
>>>>   for the agents from 0., i.e., hence physically changed the directory;
>>>>   the format can be as simple as CVS with "old name; [new name]" lines
>>>>   where omitted new name means that actual name hasn't changed
>>>>   (unlike proposed IPaddress2 -> IP)
>>>> 4. have an XSL template that will convert resource references per the
>>>>   translation file from 3. (this XSLT should be automatically
>>>>   generated based on that file) and a script that will call
>>>>   something like:
>>>>   cibadmin -Q | xsltproc <XSLT> - | cibadmin --replace --xml-pipe
> XSL to replace some agent names? I think sed is enough for that :)
> Perhaps we could expand any RA aliases as part of "cibadmin --upgrade"
> (or have a separate option for it). That would trigger resource
> restarts, though, unless we added intelligence to allow that in pacemaker.
>>>> 5. have a shell script "__cl_compat__" (for instance, name clearly
>>>>   distinguishable will become handy later on), that will:
>>>>   - figure which symlink it was called under ("$0") and figure out
>>>>     how it should behave based on file from 3.:
>>>>     . $0 found as old name with new name -> clusterlabs/<new name>
>>>>       will be called
>>>>     . $0 found as old name without new name -> clusterlabs/<old name>
>>>>       will be called
>>>>     . $0 not found as old name -> .deprecated/<old name> will be
>>>>       called if exists (otherwise fail early)
>>>>   - if "$HA_RSCTMP/$(basename $0)_compat" exists, just run:
>>>>     $0 "$@"; exit $?
>>>>     the purpose here is to avoid excessive spamming in the logs
>>>>   - touch "$HA_RSCTMP/$(basename $0)_compat"
>>>>   - emit a warning "Your configuration referes to the agent with
>>>>     an obsolete specification", followed with corresponding:
>>>>      . "please consider changing ocf:heartbeat:<old name> to
>>>>         ocf:clusterlabs:<new name>, you may use <script from 4.>
>>>> 	 to ease such transition"
>>>>      . "please consider changing ocf:heartbeat:<old name> to
>>>>         ocf:clusterlabs:<old name>, you may use <script from 4.>
>>>> 	 to ease such transition"
>>>>      . "please consider finding another alternative for
>>>>         ocf:heartbeat:<old name> as this agent is not actively
>>>> 	 maintained and will be dropped in the next major release;
>>>> 	 alternatively, if you volunteer to maintain it,
>>>> 	 please reach developers at clusterlabs.org <mailto:developers at clusterlabs.org> mailing list"
>>>> # plugging it all together
>>>> 6. for agents moved from heartbeat in any of clusterlabs/.deprecated,
>>>>   (items 1. and 2.), provide respective symlinks from heartbeat
>>>>   pointing to __cl_compat__ script from 5.
>>>> Possibly recycle for clusterlabs-staging idea.
>>>> Now, for the higher level tools (crm, pcs), they should avoid listing
>>>> or suggesting agents that are symlinks to files matching wildcard
>>>> "__*__", and perhaps even actively suggest the alternative if this
>>>> such one is to be used -- this could be reached by making __compat__
>>>> script from 5. handle one new action (to be reflected in the OCF
>>>> revision as optional), say "new-alias" that would output what
>>>> to use instead (based on file from 3. it works with anyway).
>>>>>> I wouldn't even want to update ClusterLabs docs to use the new name
>>>>>> until all major distros have the new resource-agents, which would
>>>>>> probably be at least a couple of years (I'm looking at you, Debian).

Jan (Poki)
