[ClusterLabs] Cluster resources migration from CMAN to Pacemaker

jaspal singla jaspal.singla at gmail.com
Tue Feb 9 10:04:11 UTC 2016


Hi Jan/Digiman,

Thanks for your replies. Based on your inputs, I managed to configure these
values and results were fine but still have some doubts for which I would
seek your help. I also tried to dig some of issues on internet but seems
due to lack of cman -> pacemaker documentation, I couldn't find any.

I have configured 8 scripts under one resource as you recommended. But out
of which 2 scripts are not being executed by cluster by cluster itself.
When I tried to execute the same script manually, I am able to do it but
through pacemaker command I don't.

For example:

This is the output of crm_mon command:

###############################################################################################################
Last updated: Mon Feb  8 17:30:57 2016          Last change: Mon Feb  8
17:03:29 2016 by hacluster via crmd on ha1-103.cisco.com
Stack: corosync
Current DC: ha1-103.cisco.com (version 1.1.13-10.el7-44eb2dd) - partition
with quorum
1 node and 10 resources configured

Online: [ ha1-103.cisco.com ]

 Resource Group: ctm_service
     FSCheck
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py):
 Started ha1-103.cisco.com
     NTW_IF
(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py):  Started
ha1-103.cisco.com
     CTM_RSYNC
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py):  Started
ha1-103.cisco.com
     REPL_IF
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): Started
ha1-103.cisco.com
     ORACLE_REPLICATOR
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py):
Started ha1-103.cisco.com
     CTM_SID
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): Started
ha1-103.cisco.com
     CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
     CTM_APACHE
(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): Stopped
 Resource Group: ctm_heartbeat
     CTM_HEARTBEAT
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py):   Started
ha1-103.cisco.com
 Resource Group: ctm_monitoring
     FLASHBACK
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py):
 Started ha1-103.cisco.com

Failed Actions:
* CTM_SRV_start_0 on ha1-103.cisco.com 'unknown error' (1): call=577,
status=complete, exitreason='none',
    last-rc-change='Mon Feb  8 17:12:33 2016', queued=0ms, exec=74ms

#################################################################################################################


CTM_SRV && CTM_APACHE are in stopped state. These services are not being
executed by cluster OR it is being failed somehow by cluster, not sure
why?  When I manually execute CTM_SRV script, the script gets executed
without issues.

-> For manually execution of this script I ran the below command:

# /cisco/PrimeOpticalServer/HA/bin/OracleAgent.py status

Output:

_________________________________________________________________________________________________________________
2016-02-08 17:48:41,888 INFO MainThread CtmAgent
=========================================================
Executing preliminary checks...
 Check Oracle and Listener availability
  => Oracle and listener are up.
 Migration check
  => Migration check completed successfully.
 Check the status of the DB archivelog
  => DB archivelog check completed successfully.
 Check of Oracle scheduler...
  => Check of Oracle scheduler completed successfully
 Initializing database tables
  => Database tables initialized successfully.
 Install in cache the store procedure
  => Installing store procedures completed successfully
 Gather the oracle system stats
  => Oracle stats completed successfully
Preliminary checks completed.
=========================================================
Starting base services...
Starting Zookeeper...
JMX enabled by default
Using config: /opt/CiscoTransportManagerServer/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
 Retrieving name service port...
 Starting name service...
Base services started.
=========================================================
Starting Prime Optical services...
Prime Optical services started.
=========================================================
Cisco Prime Optical Server Version: 10.5.0.0.214 / Oracle Embedded
-------------------------------------------------------------------------------------
      USER       PID      %CPU      %MEM     START      TIME   PROCESS
-------------------------------------------------------------------------------------
      root     16282       0.0       0.0  17:48:11      0:00   CTM Server
      root     16308       0.0       0.1  17:48:16      0:00   CTM Server
      root     16172       0.1       0.1  17:48:10      0:00   NameService
      root     16701      24.8       7.5  17:48:27      0:27   TOMCAT
      root     16104       0.2       0.2  17:48:09      0:00   Zookeeper
-------------------------------------------------------------------------------------
For startup details, see:
/opt/CiscoTransportManagerServer/log/ctms-start.log
2016-02-08 17:48:41,888 WARNING MainThread CtmAgent CTM restartd at attempt
1
_________________________________________________________________________________________________________________


The script gets executed and I could see that service was started but
crm_mon output still shows this CTM_SRV script in stopped state, why?



-> When I try to start the script through pcs commad, I get the below
errors in logs. I tried to debug, but couldn't manage to rectify. I'd
really appreciate if any help can be provided in order to get this resolved.

# pcs resource enable CTM_SRV


Output:
_________________________________________________________________________________________________________________

Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
native_print:        CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    FAILED
ha1-103.cisco.com
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
get_failcount_full:     CTM_SRV has failed INFINITY times on
ha1-103.cisco.com
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:  warning:
common_apply_stickiness:        Forcing CTM_SRV away from ha1-103.cisco.com
after 1000000 failures (max=1000000)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     All nodes for resource CTM_SRV are unavailable,
unclean or shutting down (ha1-103.cisco.com: 1, -1000000)
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Could not allocate a node for CTM_SRV
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Processing CTM_SRV_stop_0
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:     info:
native_color:   Resource CTM_SRV cannot run anywhere
Feb 08 17:12:42 [12877] ha1-103.cisco.com    pengine:   notice: LogActions:
    Stop    CTM_SRV (ha1-103.cisco.com)
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:   notice:
te_rsc_command: Initiating action 7: stop CTM_SRV_stop_0 on
ha1-103.cisco.com (local)
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:    debug:
do_lrm_rsc_op:  Stopped 0 recurring operations in preparation for
CTM_SRV_stop_0
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:     info:
do_lrm_rsc_op:  Performing key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112
op=CTM_SRV_stop_0
Feb 08 17:12:42 [12875] ha1-103.cisco.com       lrmd:     info:
log_execute:    executing - rsc:CTM_SRV action:stop call_id:578
Feb 08 17:12:42 [12875] ha1-103.cisco.com       lrmd:    debug:
operation_finished:     CTM_SRV_stop_0:498 - exited with rc=0
Feb 08 17:12:42 [12875] ha1-103.cisco.com       lrmd:    debug:
operation_finished:     CTM_SRV_stop_0:498:stderr [ -- empty -- ]
Feb 08 17:12:42 [12875] ha1-103.cisco.com       lrmd:    debug:
operation_finished:     CTM_SRV_stop_0:498:stdout [ 0 ]
Feb 08 17:12:42 [12875] ha1-103.cisco.com       lrmd:     info:
log_finished:   finished - rsc:CTM_SRV action:stop call_id:578 pid:498
exit-code:0 exec-time:142ms queue-time:0ms
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:    debug:
create_operation_update:        do_update_resource: Updating resource
CTM_SRV after stop op complete (interval=0)
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:   notice:
process_lrm_event:      Operation CTM_SRV_stop_0: ok (node=ha1-103.cisco.com,
call=578, rc=0, cib-update=901, confirmed=true)
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:    debug:
process_lrm_event:      ha1-103.cisco.com-CTM_SRV_stop_0:578 [ 0\n ]
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:    debug:
update_history_cache:   Updating history for 'CTM_SRV' with stop op
Feb 08 17:12:42 [12873] ha1-103.cisco.com        cib:     info:
cib_perform_op: +
 /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='CTM_SRV']/lrm_rsc_op[@id='CTM_SRV_last_0']:
 @operation_key=CTM_
SRV_stop_0, @operation=stop,
@transition-key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112,
@transition-magic=0:0;7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112,
@call-id=578, @rc-code=0, @last-run=1454969562, @last-rc-change=1454969562,
@exec-time=142
Feb 08 17:12:42 [12878] ha1-103.cisco.com       crmd:     info:
match_graph_event:      Action CTM_SRV_stop_0 (7) confirmed on
ha1-103.cisco.com (rc=0)
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
native_print:        CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
get_failcount_full:     CTM_SRV has failed INFINITY times on
ha1-103.cisco.com
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:  warning:
common_apply_stickiness:        Forcing CTM_SRV away from ha1-103.cisco.com
after 1000000 failures (max=1000000)
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     All nodes for resource CTM_SRV are unavailable,
unclean or shutting down (ha1-103.cisco.com: 1, -1000000)
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Could not allocate a node for CTM_SRV
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info:
native_color:   Resource CTM_SRV cannot run anywhere
Feb 08 17:27:42 [12877] ha1-103.cisco.com    pengine:     info: LogActions:
    Leave   CTM_SRV (Stopped)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
native_print:        CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
get_failcount_full:     CTM_SRV has failed INFINITY times on
ha1-103.cisco.com
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:  warning:
common_apply_stickiness:        Forcing CTM_SRV away from ha1-103.cisco.com
after 1000000 failures (max=1000000)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     All nodes for resource CTM_SRV are unavailable,
unclean or shutting down (ha1-103.cisco.com: 1, -1000000)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Could not allocate a node for CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
native_color:   Resource CTM_SRV cannot run anywhere
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info: LogActions:
    Leave   CTM_SRV (Stopped)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
native_print:        CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
get_failcount_full:     CTM_SRV has failed INFINITY times on
ha1-103.cisco.com
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:  warning:
common_apply_stickiness:        Forcing CTM_SRV away from ha1-103.cisco.com
after 1000000 failures (max=1000000)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     All nodes for resource CTM_SRV are unavailable,
unclean or shutting down (ha1-103.cisco.com: 1, -1000000)
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Could not allocate a node for CTM_SRV
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info:
native_color:   Resource CTM_SRV cannot run anywhere
Feb 08 17:38:00 [12877] ha1-103.cisco.com    pengine:     info: LogActions:
    Leave   CTM_SRV (Stopped)
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:    debug:
determine_op_status:    CTM_SRV_start_0 on ha1-103.cisco.com returned
'unknown error' (1) instead of the expected value: 'ok' (0)
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:  warning:
unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on
ha1-103.cisco.com: unknown error (1)
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info:
native_print:        CTM_SRV
 (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info:
get_failcount_full:     CTM_SRV has failed INFINITY times on
ha1-103.cisco.com
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:  warning:
common_apply_stickiness:        Forcing CTM_SRV away from ha1-103.cisco.com
after 1000000 failures (max=1000000)
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info:
rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     All nodes for resource CTM_SRV are unavailable,
unclean or shutting down (ha1-103.cisco.com: 1, -1000000)
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:    debug:
native_assign_node:     Could not allocate a node for CTM_SRV
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info:
native_color:   Resource CTM_SRV cannot run anywhere
Feb 08 17:38:20 [12877] ha1-103.cisco.com    pengine:     info: LogActions:
    Leave   CTM_SRV (Stopped)

________________________________________________________________________________________________________________________


Thanks
Jaspal


------------------------------
>
> Message: 3
> Date: Sat, 30 Jan 2016 03:48:03 +0100
> From: Jan Pokorn? <jpokorny at redhat.com>
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to
>         Pacemaker
> Message-ID: <20160130024803.GA27849 at redhat.com>
> Content-Type: text/plain; charset="utf-8"
>
> On 27/01/16 19:41 +0100, Jan Pokorn? wrote:
> > On 27/01/16 11:04 -0600, Ken Gaillot wrote:
> >> On 01/27/2016 02:34 AM, jaspal singla wrote:
> >>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter
> disables
> >>> the start of all services when RGManager starts). Is there any way for
> such
> >>> behavior in Pacemaker?
> >
> > Please be more careful about the descriptions; autostart=0 specified
> > at the given resource group ("service" or "vm" tag) means just not to
> > start anything contained in this very one automatically (also upon
> > new resources being defined, IIUIC), definitely not "all services".
> >
> > [...]
> >
> >> I don't think there's any exact replacement for autostart in pacemaker.
> >> Probably the closest is to set target-role=Stopped before stopping the
> >> cluster, and set target-role=Started when services are desired to be
> >> started.
>
> Beside is-managed=false (as currently used in clufter), I also looked
> at downright disabling "start" action, but this turned out to be a naive
> approach caused by unclear documentation.
>
> Pushing for a bit more clarity (hopefully):
> https://github.com/ClusterLabs/pacemaker/pull/905
>
> >>> 2) Please put some alternatives to exclusive=0 and
> __independent_subtree?
> >>> what we have in Pacemaker instead of these?
>
> (exclusive property discussed in the other subthread; as a recap,
> no extra effort is needed to achieve exclusive=0, exclusive=1 is
> currently a show stopper in clufter as neither approach is versatile
> enough)
>
> > For __independent_subtree, each component must be a separate pacemaker
> > resource, and the constraints between them would depend on exactly what
> > you were trying to accomplish. The key concepts here are ordering
> > constraints, colocation constraints, kind=Mandatory/Optional (for
> > ordering constraints), and ordered sets.
>
> Current approach in clufter as of the next branch:
> - __independent_subtree=1 -> do nothing special (hardly can be
>                              improved?)
> - __independent_subtree=2 -> for that very resource, set operations
>                              as follows:
>                              monitor (interval=60s) on-fail=ignore
>                              stop interval=0 on-fail=stop
>
> Groups carrying such resources are not unrolled into primitives plus
> contraints, as the above might suggest (also default kind=Mandatory
> for underlying order constraints should fit well).
>
> Please holler if this is not sound.
>
>
> So when put together with some other changes/fixes, current
> suggested/informative sequence of pcs commands goes like this:
>
> pcs cluster auth ha1-105.test.com
> pcs cluster setup --start --name HA1-105_CLUSTER ha1-105.test.com \
>   --consensus 12000 --token 10000 --join 60
> sleep 60
> pcs cluster cib tmp-cib.xml --config
> pcs -f tmp-cib.xml property set stonith-enabled=false
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-FSCheck \
>   lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-NTW_IF \
>   lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_RSYNC \
>   lsb:../../..//data/Product/HA/bin/RsyncAgent.py \
>   op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-REPL_IF \
>   lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \
>   op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-ORACLE_REPLICATOR \
>   lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \
>   op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_SID \
>   lsb:../../..//data/Product/HA/bin/OracleAgent.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_SRV \
>   lsb:../../..//data/Product/HA/bin/CtmAgent.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_APACHE \
>   lsb:../../..//data/Product/HA/bin/ApacheAgent.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-CTM_HEARTBEAT \
>   lsb:../../..//data/Product/HA/bin/HeartBeat.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource create RESOURCE-script-FLASHBACK \
>   lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \
>   op monitor interval=30s
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \
>   RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \
>   RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \
>   RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \
>   RESOURCE-script-CTM_APACHE
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_service-GROUP is-managed=false
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_heartbeat-GROUP \
>   RESOURCE-script-CTM_HEARTBEAT
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_heartbeat-GROUP migration-threshold=3 \
>   failure-timeout=900
> pcs -f tmp-cib.xml \
>   resource group add SERVICE-ctm_monitoring-GROUP \
>   RESOURCE-script-FLASHBACK
> pcs -f tmp-cib.xml resource \
>   meta SERVICE-ctm_monitoring-GROUP migration-threshold=3 \
>   failure-timeout=900
> pcs cluster cib-push tmp-cib.xml --config
>
>
> Any suggestions welcome...
>
> --
> Jan (Poki)
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 819 bytes
> Desc: not available
> URL: <
> http://clusterlabs.org/pipermail/users/attachments/20160130/5eb89fbd/attachment-0001.sig
> >
>
> ------------------------------
>
> _______________________________________________
> Users mailing list
> Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
>
> End of Users Digest, Vol 12, Issue 48
> *************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160209/46c07027/attachment-0003.html>


More information about the Users mailing list