[ClusterLabs] Cluster resources migration from CMAN to Pacemaker

Wed Jan 27 18:04:05 CET 2016

On 01/27/2016 02:34 AM, jaspal singla wrote:
> Hi Jan,
> 
> Thanks for your replies!!
> 
> I have couple of concerns more to answer, please help!

I'm not familiar with rgmanager, so there may be better ways that
hopefully someone else can suggest, but here are some ideas off the top
of my head:

> 1) In CMAN, there was meta attribute - autostart=0 (This parameter disables
> the start of all services when RGManager starts). Is there any way for such
> behavior in Pacemaker?
> 
> I tried to explore is-manged=0 but when I start the cluster through pcs
> cluster start OR pcs cluster start --all, my all of the resources gets
> started (even the one which has meta attribute configured as
> is-managed=false). Any clue to achieve  such behavior?
> 
> What does is-manged=false do?

Any service with is-managed=false should not be started or stopped by
the cluster. If they are already running, they should be left running;
if they are stopped, they should be left stopped.

I'm not sure why it didn't work in your test; maybe attach the output of
"pcs cluster cib" and "pcs status" with the setting in effect.

I don't think there's any exact replacement for autostart in pacemaker.
Probably the closest is to set target-role=Stopped before stopping the
cluster, and set target-role=Started when services are desired to be
started.

> 2) Please put some alternatives to exclusive=0 and __independent_subtree?
> what we have in Pacemaker instead of these?

My first though for exclusive=0 would be to configure negative
colocation constraints between that resource and all other resources.
For example:

  pcs constraint colocation add A with B "-INFINITY"

says that A should never run on a node where B is running (B is unaffected).

For __independent_subtree, each component must be a separate pacemaker
resource, and the constraints between them would depend on exactly what
you were trying to accomplish. The key concepts here are ordering
constraints, colocation constraints, kind=Mandatory/Optional (for
ordering constraints), and ordered sets.

> My cluster depends on these meta attributes and I am getting nothing in
> pacemaker as an alternative to these one. Please see, if it is feasible to
> achieve such meta attributes in Pacemaker.
> 
> 
> Thanks
> Jaspal
> 
> 
> 
> 
>> On 24/01/16 16:54 +0530, jaspal singla wrote:
>>> Thanks Digimer for letting me about the tool. I was unaware of any such
>>> tools!! And because of that I was able to search clufter-cli tool for the
>>> migration.
>>>
>>> Thanks a lot John for explaining each and everything in detailed manner.
>> I
>>> am really admired the knowledge you guys have!!
>>>
>>> I also noticed that clufter tool is written by you :). I am very thankful
>>> to you as it would save the ass of millions people like me who may have
>> had
>>> difficulties in migration of their legacy programs from CMAN to
>> Pacemaker.
>>
>> Glad to hear this, indeed :)
>>
>>> As suggested I tried to migrate my existing cluster.conf file from CMAN
>> to
>>> Pacemaker through the use of clufter. But have couple of queries going
>>> forward, would appreciate if you could answer these.
>>>
>>> Please find In-line queries:
>>
>> Answers ditto...
>>
>>>> Date: Fri, 22 Jan 2016 21:52:17 +0100
>>>> From: Jan Pokorn? <jpokorny at redhat.com>
>>>> Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to
>>>>         Pacemaker
>>>> Message-ID: <20160122205217.GE28856 at redhat.com>
>>>>
>>>> yes, as Digimer mentioned, clufter is the tool you may want to look
>>>> at.  Do not expect fully automatic miracles from it, though.
>>>> It's meant to show the conversion path, but one has to be walk it
>>>> very carefully and make adjustments every here and there.
>>>> In part because there is not a large overlap between resource agents
>>>> of both kinds.
>>>>
>>>> On 22/01/16 17:32 +0530, jaspal singla wrote:
>>>>> I desperately need some help in order to migrate my cluster
>> configuration
>>>>> from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1).
>>>>>
>>>>> I have tried to explore a lot but couldn't find similarities
>> configuring
>>>>> same resources (created in CMAN's cluster.conf file) to Pacemaker.
>>>>>
>>>>> I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same
>>>>> thing through Pacemaker. Any help would be greatly appreciable!!
>>>>>
>>>>> *Cluster.conf file*
>>>>>
>>>>> ######################################################################
>>>>>
>>>>
>>>> [reformatted configuration file below for better readability and added
>>>> some comment in-line]
>>>>
>>>>> <?xml version="1.1"?>
>>>>                  ^^^
>>>>                  no, this is not the way to increase config version
>>>>
>>>> This seems to be quite frequented mistake; looks like configuration
>>>> tools should have strictly refrained from using this XML declaration
>>>> in the first place.
>>>>
>>>>> <cluster config_version="1" name="HA1-105_CLUSTER">
>>>>>   <fence_daemon clean_start="0" post_fail_delay="0"
>> post_join_delay="3"/>
>>>>>   <clusternodes>
>>>>>     <clusternode name="ha1-105.test.com" nodeid="1" votes="1">
>>>>>       <fence/>
>>>>>     </clusternode>
>>>>
>>>> (I suppose that other nodes were omitted)
>>>>
>>>
>>> No, its Single-Node Cluster Geographical Redundancy Configuration.
>>>
>>> The geographical redundancy configuration allows us to locate two Prime
>>> Optical instances at geographically remote sites. One server instance is
>>> active; the other server instance is standby. The HA agent switches to
>> the
>>> standby Element Management System (EMS) instance if an unrecoverable
>>> failure occurs on the active EMS instance. In a single-node cluster
>>> geographical redundancy configuration, there are two clusters with
>>> different names (one on each node), each containing a server.
>>
>> Ah, it's more me not being familiar with rather exotic uses cases and
>> I definitely include degenerate single-node-on-purpose case here.
>>
>>>>>   </clusternodes>
>>>>>   <cman/>
>>>>>   <fencedevices/>
>>>>>   <rm log_facility="local4" log_level="7">
>>>>>     <failoverdomains>
>>>>>       <failoverdomain name="Ha1-105_Domain" nofailback="0" ordered="0"
>> restricted="0"/>
>>>>
>>>> TODO: have to check what does it mean when FOD is not saturated
>>>>       with any cluster node references
>>>
>>> No worries of using FOD as I don't think, it will be in use as we have
>>> groups in pacemaker.
>>
>> FYI, I checked the code and it seams utterly useless to define an
>> empty failover domain and to refer to it from the resource group.
>> Based on its properties, just the logged warnings may vary.
>> Furthermore, enabling restricted property may prevent associated
>> groups to start at all.
>>
>> Added a warning and corrected the conversion accordingly:
>> https://pagure.io/clufter/92dbe66b4eebb2b935c49bd4295b96c7954451c2
>>
>>>>>     </failoverdomains>
>>>>>     <resources>
>>>>>       <script file="/data/Product/HA/bin/ODG_IFAgent.py"
>> name="REPL_IF"/>
>>>>
>>>> General LSB-compliance-assumed commands are currently using a path hack
>>>> with lsb:XYZ resource specification.  In this very case, it means
>>>> the result after the conversion refers to
>>>> "lsb:../../..//data/Product/HA/bin/FsCheckAgent.py".
>>>>
>>>> Agreed, there should be a better way to support arbitrary locations
>>>> beside /etc/init.d/XYZ.
>>>>
>>>
>>> Configured resources as LSB as you suggested.
>>>>
>>>>>       <script file="/data/Product/HA/bin/ODG_ReplicatorAgent.py"
>> name="ORACLE_REPLICATOR"/>
>>>>>       <script file="/data/Product/HA/bin/OracleAgent.py"
>> name="CTM_SID"/>
>>>>>       <script file="/data/Product/HA/bin/NtwIFAgent.py" name="NTW_IF"/>
>>>>>       <script file="/data/Product/HA/bin/FsCheckAgent.py"
>> name="FSCheck"/>
>>>>>       <script file="/data/Product/HA/bin/ApacheAgent.py"
>> name="CTM_APACHE"/>
>>>>>       <script file="/data/Product/HA/bin/CtmAgent.py" name="CTM_SRV"/>
>>>>>       <script file="/data/Product/HA/bin/RsyncAgent.py"
>> name="CTM_RSYNC"/>
>>>>>       <script file="/data/Product/HA/bin/HeartBeat.py"
>> name="CTM_HEARTBEAT"/>
>>>>>       <script file="/data/Product/HA/bin/FlashBackMonitor.py"
>> name="FLASHBACK"/>
>>>>>     </resources>
>>>>>     <service autostart="0" domain="Ha1-105_Domain" exclusive="0"
>> name="ctm_service" recovery="disable">
>>>>
>>>> autostart="0" discovered a bug in processing towards "pcs commands"
>>>> output:
>>>> https://pagure.io/clufter/57ebc50caf2deddbc6c12042753ce0573a4a260c
>>>
>>>
>>> I don't want to start my some of the configured services when Pacemaker
>>> starts ( like it had happen in RGManager), I want to manually starts the
>>> services. Is their any way I can do that?
>>>
>>> Also, I am trying to start the cluster but "Resource Group:
>>> SERVICE-ctm_service-GROUP" is going into unmanaged state and cannot be
>>> started. Could you please give me some clue of it like why its going in
>>> unamanged state and how it can be rectified?
>>
>> Just a quick suggestion for now if it helps:
>>
>> pcs resource manage SERVICE-ctm_service-GROUP
>>
>> or alternatively:
>>
>> pcs resource meta SERVICE-ctm_service-GROUP is-managed=true
>>
>>
>> Will get back to you with answer for __independent_subtree later.
>>
>> --
>> Jan (Poki)
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: not available
>> Type: application/pgp-signature
>> Size: 819 bytes
>> Desc: not available
>> URL: <
>> http://clusterlabs.org/pipermail/users/attachments/20160125/5299880a/attachment-0001.sig
>>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Mon, 25 Jan 2016 16:49:11 -0500
>> From: Digimer <lists at alteeve.ca>
>> To: Cluster Labs - Users <users at clusterlabs.org>
>> Subject: [ClusterLabs] Moving Anvil! and Striker development to this
>>         list
>> Message-ID: <56A69857.10308 at alteeve.ca>
>> Content-Type: text/plain; charset=utf-8
>>
>> Hi all,
>>
>>   The Anvil! and Striker (and now ScanCore) projects used to have their
>> own low-volume mailing list. Recently it started getting some use and,
>> being the one who advocated the strongest for the merger of lists, I
>> decided to close it down and migrate over to here.
>>
>>   To give a brief overview of what Anvil!, Striker and ScanCore is;
>>
>>   The "Anvil!" is the name of our 2-node HA cluster based on this[1]
>> tutorial. It's also the term we generally use for the full HA stack
>> we've built.
>>
>>   Striker[2] is a web-based front-end for Managing Anvil! clusters and
>> the servers that run on them. It has been in very heavy development the
>> last year and change and we're getting close to the version 2 release
>> "real soon now(tm)".
>>
>>   ScanCore[3] is a new component that runs on both Anvil! nodes and
>> Striker dashboards. It was initially announced at the HA Summit in Brno
>> in 2015 is it's release will coincide with Striker v2's release. It is
>> an alert, predictive failure and mitigation program that is technically
>> stand-alone but has been built into the heart of the Anvil! platform. It
>> is inherently redundant does things like watch for faults, try to
>> recover from known problems autonomously and alert users to these issues.
>>
>>   I've been somewhat reluctant to move our list over because Alteeve,
>> our company and the company that builds and maintains the Anvil!,
>> Striker and ScanCore is for-profit. However, _everything_ we do is open
>> source, so I hope that won't be held against us. :)
>>
>>   If anyone has any comments or concerns about us moving our project
>> discussion to this list, please let me know and I'll do what I can to
>> make sure we address those concerns.
>>
>> Cheers!
>>
>> digimer
>>
>> 1. https://alteeve.ca/w/AN!Cluster_Tutorial_2
>> 2. https://github.com/digimer/striker
>> 3. https://github.com/digimer/striker/tree/master/ScanCore