[ClusterLabs Developers] Deprecating resource agents (was: Re: Resurrecting OCF)

Tue Sep 27 15:39:20 EDT 2016

On 09/22/2016 04:28 AM, Jan Pokorný wrote:
> On 21/09/16 16:26 -0500, Ken Gaillot wrote:
>> On 09/21/2016 10:55 AM, Jan Pokorný wrote:
>>> On 21/09/16 14:50 +1000, Andrew Beekhof wrote:
>>>> I like where this is going.
>>>> Although I don’t think we want to get into the business of trying to
>>>> script config changes from one agent to another, so I’d drop #4
>>>
>>> Not agent parameter changes, just its specification -- to reflect
>>> formally what the proposed symlink-based delegation scheme does when
>>> the old one is still in use.  If the old and new are incompatible,
>>> such automatic delegation is not possible anyway (that's one of
>>> the reasons "description" would come handy).
>>>
>>> I see there's much bigger potential (parameter renames, ...) but for
>>> that, each agent should be responsible on its own (somehow, subject
>>> of further evolution).
>>>
>>> Also, supposing there are more consumers of RA, the suggestion to
>>> run the script should be more generic ("when used from under
>>> pacemaker, ...").
>>>
>>>> I would make .deprecated a nested directory so that if we want to
>>>> retire (for example) a ClusterLabs agent in the future we can create
>>>> .deprecate/clusterlabs/ and put the agent there. Rather than make
>>>> this heartbeat specific.
>>>
>>> Good point; it would also prevent clashes when single directory should
>>> serve all the providers.
>>
>> I don't understand the desire to treat "deprecated" agents any
>> differently. It should be sufficient to just mention it in their help
>> text / man page / meta-data / other documentation. Pacemaker isn't going
>> to run a "deprecated" agent any differently.
> 
> And I don't understand how you came into the conclusion there's
> anything changed from the outer view, beside occasional note about
> deprecation being emitted to the logs.

I guess my question is, what's the benefit? Why do it? From what I
gather so far:

* It would allow OCF-using software to emit warnings when using a
deprecated agent. (Meta-data could also handle this.)

* If a sysadmin went looking for the code, it would be an obvious
indication that they need to change.

* We may feel more comfortable with removing an agent if it's been in
the "deprecated" location for some time.

Does that cover it, or is there something more?

> It'd be an implementation detail self-contained in resource-agents.

That would depend on how it is designed -- some of the proposals here
would need changes in the OCF standard, and all OCF-using software would
need to be modified to support it.

I think we should keep OCF standard changes backward-compatible as much
as possible, so that agents written to the new standard are hopefully
still usable by older software. I.e. OCFnext = OCF1 + new stuff that can
be ignored by software that doesn't support it.

Consider someone upgrading resource-agents (only) or downloading the
master version of a specific agent, to get a bug fix, but staying on an
older pacemaker and/or GUI.

>> When users see ocf:whatever:whatever, they know where to look for the
>> script. Why frustrate them by making them waste time figuring out how a
>> "nonexistent" RA is being used and finding it?
> 
> Symlink makes a clear connection and beside, I proposed "new-alias"
> action.  I think you overestimate how often the agents are physically
> investigated (I guess the project would have more committers if it
> was the case).

The average sysadmin is not a developer. They know enough to understand
simple shell scripts, and the more advanced ones are comfortable with
scripting, but very rarely does a sysadmin contribute their changes
upstream. (For a variety of reasons -- sometimes the company they work
for prohibits releasing code or makes it a hassle, sometimes their
changes are too site-specific, sometimes they're not comfortable
submitting their code to scrutiny, etc.)

There are many, many sysadmins who use OCF agents without ever
interacting with upstream. They do know where to look for the stock
agents, and they do examine the code when they have trouble (most often,
to find where a log message is being generated, but also for general
troubleshooting).

If we have a symlink, that handles this concern. On the other hand, we
may want no symlink so they notice it's missing and discover it's
deprecated, but that's at a price of some frustration and time spent.

>> If the goal is to let users know that an agent is deprecated (which is
>> the only reason that I can think of), then we can add an attribute in
>> the meta-data, and UIs/pacemaker can report/log it if present.
>>
>>   <resource-agent
>>     name="Evmsd"
>>     deprecated="No longer actively maintained"
>>   >
>>
>>>> I wonder if some of this should live in pacemaker itself though…
>>>
>>> This runs directly to the other side of the RA-pacemaker bias,
>>> pacemaker caring about RA evolutionary internals :-)
>>>
>>> In the outlook, that would make any separated OCF standard efforts
>>> worthless and we could just call it pacemaker resource standard
>>> right away and forget about any sort of self-containment
>>> (the proposed procedure aims to align with).
>>>
>>> I am not sure that would be the best thing.
>>
>> Agreed, anything we come up with should be explicit in the OCF standard.
>> But I think this behavior could be specified in the standard.
> 
> As the standard provides guarantees for outer interfacing, there's no
> utter need to externalize otherwise self-contained subtleties, in this
> case beyond saying that symlinks to __formatted__ files should be
> excluded from agent lists (might be overridden on demand).
> 
>>>> If resources_action_create() cannot find ocf:${provider}:${agent} in
>>>> its usual location, look up
>>>> ${OCF_ROOT_DIR}/.compat/${provider}/__entries__
>>>>
>>>> Format for __entries__:
>>>>    # old, replacement
>>>>    # ${agent} , ${new_provider}:${new_agent} , ${description}
>>>>    IPaddr , clusterlabs:IP , Replaced with different semantics
>>>>    IPaddr2 , clusterlabs:IP , Moved
>>>>    drbd , linbit:drbd , Moved
>>>>    eDirectory , , Deleted
>>>
>>> Additional "what happened" field might work well in the update
>>> suggestions.
>>>
>>>> Assuming an entry is found:
>>>> - If  . compat/${old_provider}/${old_agent} exists, notify the user
>>>>    “somehow”, then call it.
>>>> - Otherwise, return OCF_ERR_NOT_INSTALLED and use ${description} and
>>>>   ${replacement} as the exit reason (which shows up in pcs status).
>>>>
>>>> Perhaps the “somehow” is creating PCMK_OCF_DEPRECATED (with the same
>>>> semantics as PCMK_OCF_DEGRADED) and prepending ${description} to the
>>>> output (assuming its not a metadata op) and/or the exit reason[1].
>>>> Maybe only on successful start operations to minimise the noise?
>>>>
>>>> [1] Shouldn’t be too hard with some extra fields for 'struct
>>>>     svc_action_private_s’ or svc_action_t
>>
>> I like the idea of intelligent aliasing. I'm hoping we can do it without
>> a separate directory structure and meta-meta-data files.
>>
>> What about continuing to use symlinks from the old name to the new name,
>> and adding an aliases section to the agent meta-data?
>>
>>   <resource-agent name="IP" version="1.5">
>>     <version>2.0</version>
>>     <aliases>
>>       <alias name="ocf:heartbeat:IPaddr"
>>         reason="Replaced with different semantics">
>>       <alias name="ocf:heartbeat:IPaddr2"
>>         reason="Superseded by clusterlabs provider">
>>     </aliases>
>>   </resource-agent>
>>
>> The file would be where people expect to find it, and the intent would
>> be readable.
>>
>> If pacemaker loads an RA's metadata and finds the configured name in the
>> aliases section, it could log a warning with the reason.
>>
>> The only drawback I see is that there is no inherent coordination
>> between the symlinks and the aliases. But either would be fine without
>> the other, so I don't see that as serious.
>>
>>>>
>>>>> On 19 Aug 2016, at 6:59 PM, Jan Pokorný <jpokorny at redhat.com> wrote:
>>>>>
>>>>> On 18/08/16 17:27 +0200, Klaus Wenninger wrote:
>>>>>> On 08/18/2016 05:16 PM, Ken Gaillot wrote:
>>>>>>> On 08/18/2016 08:31 AM, Kristoffer Grönlund wrote:
>>>>>>>> Jan Pokorný <jpokorny at redhat.com> writes:
>>>>>>>>
>>>>>>>>> Thinking about that, ClusterLabs may be considered a brand established
>>>>>>>>> well enough for "clusterlabs" provider to work better than anything
>>>>>>>>> general such as previously proposed "core".  Also, it's not expected
>>>>>>>>> there will be more RA-centered projects under this umbrella than
>>>>>>>>> resource-agents (pacemaker deserves to be a provider on its own),
>>>>>>>>> so it would be pretty unambiguous pointer.
>>>>>>>> I like this suggestion as well.
>>>>>>> Sounds good to me.
>>>>>>>
>>>>>>>>> And for new, not well-tested agents within resource-agents, there could
>>>>>>>>> also be a provider schema akin to "clusterlabs-staging" introduced.
>>>>>>>>>
>>>>>>>>> 1 CZK
>>>>>>>> ...and this too.
>>>>>>> I'd rather not see this. If the RA gets promoted to "well-tested",
>>>>>>> everyone's configuration has to change. And there's never a clear line
>>>>>>> between "not well-tested" and "well-tested", so things wind up staying
>>>>>>> in "beta" status long after they're widely used in production, which
>>>>>>> unnecessarily makes people question their reliability.
>>>>>>>
>>>>>>> If an RA is considered experimental, say so in the documentation
>>>>>>> (including the man page and help text), and give it an "0.x" version number.
>>>>>>>
>>>>>>>> Here is another one: While we are moving agents into a new namespace,
>>>>>>>> perhaps it is time to clean up some of the legacy agents that are no
>>>>>>>> longer recommended or of questionable quality? Off the top of my head,
>>>>>>>> there are
>>>>>>>>
>>>>>>>> * heartbeat/Evmsd
>>>>>>>> * heartbeat/EvmsSCC
>>>>>>>> * heartbeat/LinuxSCSI
>>>>>>>> * heartbeat/pingd
>>>>>>>> * heartbeat/IPaddr
>>>>>>>> * heartbeat/ManageRAID
>>>>>>>> * heartbeat/vmware
>>>>>>>>
>>>>>>>> A pet peeve of mine would also be to move heartbeat/IPaddr2 to
>>>>>>>> clusterlabs/IP, to finally get rid of that weird 2 in the name...
>>>>>>> +1!!! (or is it -2?)
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kristoffer
>>>>>>> Obviously, we need to keep the ocf:heartbeat provider around for
>>>>>>> backward compatibility, for the extensive existing uses both in cluster
>>>>>>> configurations and in the zillions of how-to's scattered around the web.
>>>>>>>
>>>>>>> Also, despite the recommendation of creating your own provider, many
>>>>>>> people drop custom RAs in the heartbeat directory.
>>>>>>>
>>>>>>> The simplest approach would be to just symlink heartbeat to clusterlabs,
>>>>>>> but I think that's a bad idea. If a custom RA deployment or some package
>>>>>>> other than resource-agents puts an RA there, resource-agents will try to
>>>>>>> make it a symlink and the other package will try to make it a directory.
>>>>>>> Plus, people may have configuration management systems and/or file
>>>>>>> integrity systems that need it to be a directory.
>>>>>>>
>>>>>>> So, I'd recommend we keep the heartbeat directory, and keep the old RAs
>>>>>>> you list above in it, move the rest of the RAs to the new clusterlabs
>>>>>>> directory, and symlink each one back to the heartbeat directory. At the
>>>>>>> same time, we can announce the heartbeat provider as deprecated, and
>>>>>>> after a very long time (when it's difficult to find references to it via
>>>>>>> google), we can drop it.
>>>>>>
>>>>>> Maybe a way to go for the staging-RAs as well:
>>>>>> Have them in clusterlabs-staging and symlinked (during install
>>>>>> or package-generation) into clusterlabs ... while they are
>>>>>> cleanly separated in the source-tree.
>>>>>
>>>>> So, having some more thoughts on this, here's the possible action
>>>>> plan (just for heartbeat -> clusterlabs transition + deprecating
>>>>> some agents, but clusterlabs-staging -> clusterlabs would be similar):
>>>>>
>>>>> # (adapt and) move original heartbeat agents
>>>>>
>>>>> 1. have a resource.d subdirectory "clusterlabs" and move (possibly under
>>>>>   new names) agents that were a priori updated to reflect new revision
>>>>>   of OCF there
>>>>>
>>>>> 2. have a resource.d subdirectory ".deprecated" (for instance) and
>>>>>   move the RAs that are going to be sunset over there (i.e.,
>>>>>   original heartbeat agents = agents moved to clusterlabs + agents
>>>>>   moved to .deprecated + agents that remained under heartbeat, pending
>>>>>   to be moved under cluster labs)
>>>>>
>>>>> # preparation for backward compatibility
>>>>>
>>>>> 3. have a file with old heartbeat name -> new clusterlabs name mapping
>>>>>   for the agents from 0., i.e., hence physically changed the directory;
>>>>>   the format can be as simple as CVS with "old name; [new name]" lines
>>>>>   where omitted new name means that actual name hasn't changed
>>>>>   (unlike proposed IPaddress2 -> IP)
>>>>>
>>>>> 4. have an XSL template that will convert resource references per the
>>>>>   translation file from 3. (this XSLT should be automatically
>>>>>   generated based on that file) and a script that will call
>>>>>   something like:
>>>>>   cibadmin -Q | xsltproc <XSLT> - | cibadmin --replace --xml-pipe
>>
>> XSL to replace some agent names? I think sed is enough for that :)
>>
>> Perhaps we could expand any RA aliases as part of "cibadmin --upgrade"
>> (or have a separate option for it). That would trigger resource
>> restarts, though, unless we added intelligence to allow that in pacemaker.
>>
>>>>> 5. have a shell script "__cl_compat__" (for instance, name clearly
>>>>>   distinguishable will become handy later on), that will:
>>>>>   - figure which symlink it was called under ("$0") and figure out
>>>>>     how it should behave based on file from 3.:
>>>>>     . $0 found as old name with new name -> clusterlabs/<new name>
>>>>>       will be called
>>>>>     . $0 found as old name without new name -> clusterlabs/<old name>
>>>>>       will be called
>>>>>     . $0 not found as old name -> .deprecated/<old name> will be
>>>>>       called if exists (otherwise fail early)
>>>>>   - if "$HA_RSCTMP/$(basename $0)_compat" exists, just run:
>>>>>     $0 "$@"; exit $?
>>>>>     the purpose here is to avoid excessive spamming in the logs
>>>>>   - touch "$HA_RSCTMP/$(basename $0)_compat"
>>>>>   - emit a warning "Your configuration referes to the agent with
>>>>>     an obsolete specification", followed with corresponding:
>>>>>      . "please consider changing ocf:heartbeat:<old name> to
>>>>>         ocf:clusterlabs:<new name>, you may use <script from 4.>
>>>>> 	 to ease such transition"
>>>>>      . "please consider changing ocf:heartbeat:<old name> to
>>>>>         ocf:clusterlabs:<old name>, you may use <script from 4.>
>>>>> 	 to ease such transition"
>>>>>      . "please consider finding another alternative for
>>>>>         ocf:heartbeat:<old name> as this agent is not actively
>>>>> 	 maintained and will be dropped in the next major release;
>>>>> 	 alternatively, if you volunteer to maintain it,
>>>>> 	 please reach developers at clusterlabs.org <mailto:developers at clusterlabs.org> mailing list"
>>>>>
>>>>> # plugging it all together
>>>>>
>>>>> 6. for agents moved from heartbeat in any of clusterlabs/.deprecated,
>>>>>   (items 1. and 2.), provide respective symlinks from heartbeat
>>>>>   pointing to __cl_compat__ script from 5.
>>>>>
>>>>> Possibly recycle for clusterlabs-staging idea.
>>>>>
>>>>>
>>>>> Now, for the higher level tools (crm, pcs), they should avoid listing
>>>>> or suggesting agents that are symlinks to files matching wildcard
>>>>> "__*__", and perhaps even actively suggest the alternative if this
>>>>> such one is to be used -- this could be reached by making __compat__
>>>>> script from 5. handle one new action (to be reflected in the OCF
>>>>> revision as optional), say "new-alias" that would output what
>>>>> to use instead (based on file from 3. it works with anyway).
>>>>>
>>>>>>> I wouldn't even want to update ClusterLabs docs to use the new name
>>>>>>> until all major distros have the new resource-agents, which would
>>>>>>> probably be at least a couple of years (I'm looking at you, Debian).