[ClusterLabs] Cluster resources migration from CMAN to Pacemaker

Fri Jan 22 15:52:17 EST 2016

Hello,

yes, as Digimer mentioned, clufter is the tool you may want to look
at.  Do not expect fully automatic miracles from it, though.
It's meant to show the conversion path, but one has to be walk it
very carefully and make adjustments every here and there.
In part because there is not a large overlap between resource agents 
of both kinds.

On 22/01/16 17:32 +0530, jaspal singla wrote:
> I desperately need some help in order to migrate my cluster configuration
> from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1).
> 
> I have tried to explore a lot but couldn't find similarities configuring
> same resources (created in CMAN's cluster.conf file) to Pacemaker.
> 
> I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same
> thing through Pacemaker. Any help would be greatly appreciable!!
> 
> *Cluster.conf file*
> 
> ######################################################################
> 

[reformatted configuration file below for better readability and added
some comment in-line]

> <?xml version="1.1"?>
                 ^^^
		 no, this is not the way to increase config version

This seems to be quite frequented mistake; looks like configuration
tools should have strictly refrained from using this XML declaration
in the first place.

> <cluster config_version="1" name="HA1-105_CLUSTER">
>   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>   <clusternodes>
>     <clusternode name="ha1-105.test.com" nodeid="1" votes="1">
>       <fence/>
>     </clusternode>

(I suppose that other nodes were omitted)

>   </clusternodes>
>   <cman/>
>   <fencedevices/>
>   <rm log_facility="local4" log_level="7">
>     <failoverdomains>
>       <failoverdomain name="Ha1-105_Domain" nofailback="0" ordered="0" restricted="0"/>

TODO: have to check what does it mean when FOD is not saturated
      with any cluster node references

>     </failoverdomains>
>     <resources>
>       <script file="/data/Product/HA/bin/ODG_IFAgent.py" name="REPL_IF"/>

General LSB-compliance-assumed commands are currently using a path hack
with lsb:XYZ resource specification.  In this very case, it means
the result after the conversion refers to
"lsb:../../..//data/Product/HA/bin/FsCheckAgent.py".

Agreed, there should be a better way to support arbitrary locations
beside /etc/init.d/XYZ.

>       <script file="/data/Product/HA/bin/ODG_ReplicatorAgent.py" name="ORACLE_REPLICATOR"/>
>       <script file="/data/Product/HA/bin/OracleAgent.py" name="CTM_SID"/>
>       <script file="/data/Product/HA/bin/NtwIFAgent.py" name="NTW_IF"/>
>       <script file="/data/Product/HA/bin/FsCheckAgent.py" name="FSCheck"/>
>       <script file="/data/Product/HA/bin/ApacheAgent.py" name="CTM_APACHE"/>
>       <script file="/data/Product/HA/bin/CtmAgent.py" name="CTM_SRV"/>
>       <script file="/data/Product/HA/bin/RsyncAgent.py" name="CTM_RSYNC"/>
>       <script file="/data/Product/HA/bin/HeartBeat.py" name="CTM_HEARTBEAT"/>
>       <script file="/data/Product/HA/bin/FlashBackMonitor.py" name="FLASHBACK"/>
>     </resources>
>     <service autostart="0" domain="Ha1-105_Domain" exclusive="0" name="ctm_service" recovery="disable">

autostart="0" discovered a bug in processing towards "pcs commands"
output:
https://pagure.io/clufter/57ebc50caf2deddbc6c12042753ce0573a4a260c

>       <script ref="FSCheck"/>
>       <script ref="NTW_IF"/>
>       <script __independent_subtree="2" __max_restarts="20" __restart_expire_time="900" ref="CTM_RSYNC"/>
>       <script __independent_subtree="2" __max_restarts="10" __restart_expire_time="900" ref="REPL_IF"/>

__independent_subtree is currently not supported

>       <script __independent_subtree="2" ref="ORACLE_REPLICATOR"/>
>       <script ref="CTM_SID">
>         <script ref="CTM_SRV">
>           <script ref="CTM_APACHE"/>
>         </script>
>       </script>
>     </service>
>     <service autostart="1" exclusive="0" max_restarts="3" name="ctm_heartbeat" recovery="restart" restart_expire_time="900">

recovery/restart parameters were not supported until now:
https://pagure.io/clufter/0bddf45587588db38086c6b6498ab77004fa59b4

>       <script ref="CTM_HEARTBEAT"/>
>     </service>
>     <service autostart="1" exclusive="0" max_restarts="3" name="ctm_monitoring" recovery="restart" restart_expire_time="900">
>       <script ref="FLASHBACK"/>
>     </service>
>   </rm>
> </cluster>
> 
> ###############################################
> 
> 
> * Quries/concerns:*
> 
> -> How can I specifically mentioned above 10 resources through Pacemaker?

Using the newest code from the next branch of linked repository
(will make a standard release shortly) the suggestion that still
require manual review(!) is a sequence of commands like this:

$ clufter ccs2pcscmd -gqs jaspal.conf 
> pcs cluster auth ha1-105.test.com
> pcs cluster setup --start --name HA1-105_CLUSTER ha1-105.test.com \
>   --consensus 12000 --token 10000 --join 60
> sleep 60
> pcs cluster cib tmp-cib.xml --config
> pcs -f tmp-cib.xml property set stonith-enabled=false
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-FSCheck \
>   lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \
>   op monitor id=RESOURCE-script-FSCheck-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-NTW_IF \
>   lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \
>   op monitor id=RESOURCE-script-NTW_IF-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_RSYNC \
>   lsb:../../..//data/Product/HA/bin/RsyncAgent.py \
>   op monitor id=RESOURCE-script-CTM_RSYNC-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-REPL_IF \
>   lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \
>   op monitor id=RESOURCE-script-REPL_IF-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-ORACLE_REPLICATOR \
>   lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \
>   op monitor id=RESOURCE-script-ORACLE_REPLICATOR-OP-monitor \
>   name=monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_SID \
>   lsb:../../..//data/Product/HA/bin/OracleAgent.py \
>   op monitor id=RESOURCE-script-CTM_SID-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_SRV \
>   lsb:../../..//data/Product/HA/bin/CtmAgent.py \
>   op monitor id=RESOURCE-script-CTM_SRV-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_APACHE \
>   lsb:../../..//data/Product/HA/bin/ApacheAgent.py \
>   op monitor id=RESOURCE-script-CTM_APACHE-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_HEARTBEAT \
>   lsb:../../..//data/Product/HA/bin/HeartBeat.py \
>   op monitor id=RESOURCE-script-CTM_HEARTBEAT-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-FLASHBACK \
>   lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \
>   op monitor id=RESOURCE-script-FLASHBACK-OP-monitor name=monitor \
>   interval=30s
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \
>   RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \
>   RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \
>   RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \
>   RESOURCE-script-CTM_APACHE
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_service-GROUP is-managed=false
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_heartbeat-GROUP \
>   RESOURCE-script-CTM_HEARTBEAT
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_heartbeat-GROUP migration-threshold=3 \
>   failure-timeout=900
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_monitoring-GROUP \
>   RESOURCE-script-FLASHBACK
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_monitoring-GROUP migration-threshold=3 \
>   failure-timeout=900
> pcs cluster cib-push tmp-cib.xml --config

> -> the services being used in <service> section are not init.d services,
> these services uses script reference of above defined resources. So, how
> could I do the same thing in Pacemaker?

see above around "FsCheckAgent.py"

> Couple of concerns I have:
> -> How do I create failover domains in pacemaker and link resources to it?

In Pacemaker, there is no direct equivalent of failover domains.
The constraints and temporal behavior cooked into the concept
of failover domains are split into orthogonal properties in the
Pacemaker world[*] and it's true some traits are very hard to model
there, if achievable without external support (like in the resource
agent directly) at all.

Also note that failover domains allow for easy mixing of symmetric
and assymetric behavior within the subset of nodes for particular
resources, something really not straightforward in Pacemaker.

On the other hand, Pacemaker offers very fine-grained approach to
customizing the behavior of the cluster so forgetting about concept
of failover domain should be relatively painless.

[*] search for failoverdomain in
    https://pagure.io/clufter/blob/master/f/__root__/doc/rgmanager-pacemaker.02.resources.txt

> -> By default there are several pre-defined resource API's given in
> Pacemaker and we can use them if our requirements match with pre-defined
> API's like IPADDR2, Apache etc. But what if I have some python scripts and
> want to use those scripts as resources? Is their any way to do that?

If they are one-off launchers of some long-running process, you may
want to use ocf:anything resource (contained in resource-agents package):

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/anything

and refer to them via binfile parameter.

Or you may consider making your scripts OCF compliant:
http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160122/d48e17c5/attachment-0003.sig>