[ClusterLabs] Cluster resources migration from CMAN to Pacemaker (Jan Pokorn?)

Sun Jan 24 12:24:38 CET 2016

Hello,

Thanks Digimer for letting me about the tool. I was unaware of any such
tools!! And because of that I was able to search clufter-cli tool for the
migration.

Thanks a lot John for explaining each and everything in detailed manner. I
am really admired the knowledge you guys have!!

I also noticed that clufter tool is written by you :). I am very thankful
to you as it would save the ass of millions people like me who may have had
difficulties in migration of their legacy programs from CMAN to Pacemaker.

As suggested I tried to migrate my existing cluster.conf file from CMAN to
Pacemaker through the use of clufter. But have couple of queries going
forward, would appreciate if you could answer these.

Please find In-line queries:

>
>
> Message: 2
> Date: Fri, 22 Jan 2016 21:52:17 +0100
> From: Jan Pokorn? <jpokorny at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed        <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to
>         Pacemaker
> Message-ID: <20160122205217.GE28856 at redhat.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hello,
>
> yes, as Digimer mentioned, clufter is the tool you may want to look
> at.  Do not expect fully automatic miracles from it, though.
> It's meant to show the conversion path, but one has to be walk it
> very carefully and make adjustments every here and there.
> In part because there is not a large overlap between resource agents
> of both kinds.
>
> On 22/01/16 17:32 +0530, jaspal singla wrote:
> > I desperately need some help in order to migrate my cluster configuration
> > from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1).
> >
> > I have tried to explore a lot but couldn't find similarities configuring
> > same resources (created in CMAN's cluster.conf file) to Pacemaker.
> >
> > I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same
> > thing through Pacemaker. Any help would be greatly appreciable!!
> >
> > *Cluster.conf file*
> >
> > ######################################################################
> >
>
> [reformatted configuration file below for better readability and added
> some comment in-line]
>
> > <?xml version="1.1"?>
>                  ^^^
>                  no, this is not the way to increase config version
>
> This seems to be quite frequented mistake; looks like configuration
> tools should have strictly refrained from using this XML declaration
> in the first place.
>
> > <cluster config_version="1" name="HA1-105_CLUSTER">
> >   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> >   <clusternodes>
> >     <clusternode name="ha1-105.test.com" nodeid="1" votes="1">
> >       <fence/>
> >     </clusternode>
>
> (I suppose that other nodes were omitted)
>

No, its Single-Node Cluster Geographical Redundancy Configuration.

The geographical redundancy configuration allows us to locate two Prime
Optical instances at geographically remote sites. One server instance is
active; the other server instance is standby. The HA agent switches to the
standby Element Management System (EMS) instance if an unrecoverable
failure occurs on the active EMS instance. In a single-node cluster
geographical redundancy configuration, there are two clusters with
different names (one on each node), each containing a server.

>
> >   </clusternodes>
> >   <cman/>
> >   <fencedevices/>
> >   <rm log_facility="local4" log_level="7">
> >     <failoverdomains>
> >       <failoverdomain name="Ha1-105_Domain" nofailback="0" ordered="0"
> restricted="0"/>
>
> TODO: have to check what does it mean when FOD is not saturated
>       with any cluster node references
>

No worries of using FOD as I don't think, it will be in use as we have
groups in pacemaker.

>
> >     </failoverdomains>
> >     <resources>
> >       <script file="/data/Product/HA/bin/ODG_IFAgent.py" name="REPL_IF"/>
>
> General LSB-compliance-assumed commands are currently using a path hack
> with lsb:XYZ resource specification.  In this very case, it means
> the result after the conversion refers to
> "lsb:../../..//data/Product/HA/bin/FsCheckAgent.py".
>
> Agreed, there should be a better way to support arbitrary locations

beside /etc/init.d/XYZ.
>

Configured resources as LSB as you suggested.

>
> >       <script file="/data/Product/HA/bin/ODG_ReplicatorAgent.py"
> name="ORACLE_REPLICATOR"/>
> >       <script file="/data/Product/HA/bin/OracleAgent.py" name="CTM_SID"/>
> >       <script file="/data/Product/HA/bin/NtwIFAgent.py" name="NTW_IF"/>
> >       <script file="/data/Product/HA/bin/FsCheckAgent.py"
> name="FSCheck"/>
> >       <script file="/data/Product/HA/bin/ApacheAgent.py"
> name="CTM_APACHE"/>
> >       <script file="/data/Product/HA/bin/CtmAgent.py" name="CTM_SRV"/>
> >       <script file="/data/Product/HA/bin/RsyncAgent.py"
> name="CTM_RSYNC"/>
> >       <script file="/data/Product/HA/bin/HeartBeat.py"
> name="CTM_HEARTBEAT"/>
> >       <script file="/data/Product/HA/bin/FlashBackMonitor.py"
> name="FLASHBACK"/>
> >     </resources>
> >     <service autostart="0" domain="Ha1-105_Domain" exclusive="0"
> name="ctm_service" recovery="disable">
>
> autostart="0" discovered a bug in processing towards "pcs commands"
> output:
> https://pagure.io/clufter/57ebc50caf2deddbc6c12042753ce0573a4a260c

I don't want to start my some of the configured services when Pacemaker
starts ( like it had happen in RGManager), I want to manually starts the
services. Is their any way I can do that?

Also, I am trying to start the cluster but "Resource Group:
SERVICE-ctm_service-GROUP" is going into unmanaged state and cannot be
started. Could you please give me some clue of it like why its going in
unamanged state and how it can be rectified?

Here is the resource group snip:

Resource Group: SERVICE-ctm_service-GROUP
     RESOURCE-script-FSCheck
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py):
 Started ha1-103.cisco.com (unmanaged)
     RESOURCE-script-NTW_IF
(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py):  Stopped
(unmanaged)
     RESOURCE-script-CTM_RSYNC
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py):  Stopped
(unmanaged)
     RESOURCE-script-REPL_IF
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): Stopped
(unmanaged)
     RESOURCE-script-ORACLE_REPLICATOR
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py):
Stopped (unmanaged)
     RESOURCE-script-CTM_SID
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py):
(target-role:Stopped) Started ha1-103.cisco.com (unmanaged)
     RESOURCE-script-CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
(unmanaged)
     RESOURCE-script-CTM_APACHE
(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): FAILED
ha1-103.cisco.com (unmanaged)
 Resource Group: SERVICE-ctm_heartbeat-GROUP
     RESOURCE-script-CTM_HEARTBEAT
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py):   Started
ha1-103.cisco.com
 Resource Group: SERVICE-ctm_monitoring-GROUP
     RESOURCE-script-FLASHBACK
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py):
 Started ha1-103.cisco.com

>
> >       <script ref="FSCheck"/>
> >       <script ref="NTW_IF"/>
> >       <script __independent_subtree="2" __max_restarts="20"
> __restart_expire_time="900" ref="CTM_RSYNC"/>
> >       <script __independent_subtree="2" __max_restarts="10"
> __restart_expire_time="900" ref="REPL_IF"/>
>
> __independent_subtree is currently not supported
>
> >       <script __independent_subtree="2" ref="ORACLE_REPLICATOR"/>
> >       <script ref="CTM_SID">
> >         <script ref="CTM_SRV">
> >           <script ref="CTM_APACHE"/>
> >         </script>
> >       </script>
> >     </service>
> >     <service autostart="1" exclusive="0" max_restarts="3"
> name="ctm_heartbeat" recovery="restart" restart_expire_time="900">
>
> recovery/restart parameters were not supported until now:
> https://pagure.io/clufter/0bddf45587588db38086c6b6498ab77004fa59b4

What should we use instead of "__independent_subtree="2" in Pacemaker? Is
their any other way  to achieve the previous behavior?  Please suggest.

>
>
> >       <script ref="CTM_HEARTBEAT"/>
> >     </service>
> >     <service autostart="1" exclusive="0" max_restarts="3"
> name="ctm_monitoring" recovery="restart" restart_expire_time="900">
> >       <script ref="FLASHBACK"/>
> >     </service>
> >   </rm>
> > </cluster>
> >
> > ###############################################
> >
> >
> > * Quries/concerns:*
> >
> > -> How can I specifically mentioned above 10 resources through Pacemaker?
>
> Using the newest code from the next branch of linked repository
> (will make a standard release shortly) the suggestion that still
> require manual review(!) is a sequence of commands like this:
>
> $ clufter ccs2pcscmd -gqs jaspal.conf
> > pcs cluster auth ha1-105.test.com
> > pcs cluster setup --start --name HA1-105_CLUSTER ha1-105.test.com \
> >   --consensus 12000 --token 10000 --join 60
> > sleep 60
> > pcs cluster cib tmp-cib.xml --config
> > pcs -f tmp-cib.xml property set stonith-enabled=false
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-FSCheck \
> >   lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \
> >   op monitor id=RESOURCE-script-FSCheck-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-NTW_IF \
> >   lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \
> >   op monitor id=RESOURCE-script-NTW_IF-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-CTM_RSYNC \
> >   lsb:../../..//data/Product/HA/bin/RsyncAgent.py \
> >   op monitor id=RESOURCE-script-CTM_RSYNC-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-REPL_IF \
> >   lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \
> >   op monitor id=RESOURCE-script-REPL_IF-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-ORACLE_REPLICATOR \
> >   lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \
> >   op monitor id=RESOURCE-script-ORACLE_REPLICATOR-OP-monitor \
> >   name=monitor interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-CTM_SID \
> >   lsb:../../..//data/Product/HA/bin/OracleAgent.py \
> >   op monitor id=RESOURCE-script-CTM_SID-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-CTM_SRV \
> >   lsb:../../..//data/Product/HA/bin/CtmAgent.py \
> >   op monitor id=RESOURCE-script-CTM_SRV-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-CTM_APACHE \
> >   lsb:../../..//data/Product/HA/bin/ApacheAgent.py \
> >   op monitor id=RESOURCE-script-CTM_APACHE-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-CTM_HEARTBEAT \
> >   lsb:../../..//data/Product/HA/bin/HeartBeat.py \
> >   op monitor id=RESOURCE-script-CTM_HEARTBEAT-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource create RESOURCE-script-FLASHBACK \
> >   lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \
> >   op monitor id=RESOURCE-script-FLASHBACK-OP-monitor name=monitor \
> >   interval=30s
> > pcs -f tmp-cib.xml \
> >   resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \
> >   RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \
> >   RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \
> >   RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \
> >   RESOURCE-script-CTM_APACHE
> > pcs -f tmp-cib.xml resource \
> >   meta SERVICE-ctm_service-GROUP is-managed=false
> > pcs -f tmp-cib.xml \
> >   resource group add SERVICE-ctm_heartbeat-GROUP \
> >   RESOURCE-script-CTM_HEARTBEAT
> > pcs -f tmp-cib.xml resource \
> >   meta SERVICE-ctm_heartbeat-GROUP migration-threshold=3 \
> >   failure-timeout=900
> > pcs -f tmp-cib.xml \
> >   resource group add SERVICE-ctm_monitoring-GROUP \
> >   RESOURCE-script-FLASHBACK
> > pcs -f tmp-cib.xml resource \
> >   meta SERVICE-ctm_monitoring-GROUP migration-threshold=3 \
> >   failure-timeout=900
> > pcs cluster cib-push tmp-cib.xml --config
>
> > -> the services being used in <service> section are not init.d services,
> > these services uses script reference of above defined resources. So, how
> > could I do the same thing in Pacemaker?
>
> see above around "FsCheckAgent.py"
>
> > Couple of concerns I have:
> > -> How do I create failover domains in pacemaker and link resources to
> it?
>
> In Pacemaker, there is no direct equivalent of failover domains.
> The constraints and temporal behavior cooked into the concept
> of failover domains are split into orthogonal properties in the
> Pacemaker world[*] and it's true some traits are very hard to model
> there, if achievable without external support (like in the resource
> agent directly) at all.
>
> Also note that failover domains allow for easy mixing of symmetric
> and assymetric behavior within the subset of nodes for particular
> resources, something really not straightforward in Pacemaker.
>
> On the other hand, Pacemaker offers very fine-grained approach to
> customizing the behavior of the cluster so forgetting about concept
> of failover domain should be relatively painless.
>
> [*] search for failoverdomain in
>
> https://pagure.io/clufter/blob/master/f/__root__/doc/rgmanager-pacemaker.02.resources.txt
>
> > -> By default there are several pre-defined resource API's given in
> > Pacemaker and we can use them if our requirements match with pre-defined
> > API's like IPADDR2, Apache etc. But what if I have some python scripts
> and
> > want to use those scripts as resources? Is their any way to do that?
>
> If they are one-off launchers of some long-running process, you may
> want to use ocf:anything resource (contained in resource-agents package):
>
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/anything
>
> and refer to them via binfile parameter.
>
> Or you may consider making your scripts OCF compliant:
> http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
>
> --
> Jan (Poki)
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 819 bytes
> Desc: not available
> URL: <
> http://clusterlabs.org/pipermail/users/attachments/20160122/d48e17c5/attachment-0001.sig
> >
>
> ------------------------------
>
> _______________________________________________
> Users mailing list
> Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
>
> End of Users Digest, Vol 12, Issue 39
> *************************************
>

Thanks
Jaspal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20160124/d8351d6c/attachment-0001.html>