<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Hi Jan/Digiman,</div><div><br></div><div>Thanks for your replies. Based on your inputs, I managed to configure these values and results were fine but still have some doubts for which I would seek your help. I also tried to dig some of issues on internet but seems due to lack of cman -> pacemaker documentation, I couldn't find any.</div><div><br></div><div>I have configured 8 scripts under one resource as you recommended. But out of which 2 scripts are not being executed by cluster by cluster itself. When I tried to execute the same script manually, I am able to do it but through pacemaker command I don't.</div><div><br></div><div>For example:</div><div><br></div><div>This is the output of crm_mon command:</div><div><br></div><div>###############################################################################################################</div><div><div>Last updated: Mon Feb 8 17:30:57 2016 Last change: Mon Feb 8 17:03:29 2016 by hacluster via crmd on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Stack: corosync</div><div>Current DC: <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (version 1.1.13-10.el7-44eb2dd) - partition with quorum</div><div>1 node and 10 resources configured</div><div><br></div><div>Online: [ <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> ]</div><div><br></div><div> Resource Group: ctm_service</div><div> FSCheck (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> NTW_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> CTM_RSYNC (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> REPL_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> ORACLE_REPLICATOR (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> CTM_SID (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): Stopped</div><div> CTM_APACHE (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): Stopped</div><div> Resource Group: ctm_heartbeat</div><div> CTM_HEARTBEAT (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> Resource Group: ctm_monitoring</div><div> FLASHBACK (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div><br></div><div>Failed Actions:</div><div>* CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> 'unknown error' (1): call=577, status=complete, exitreason='none',</div><div> last-rc-change='Mon Feb 8 17:12:33 2016', queued=0ms, exec=74ms</div></div><div><br></div><div>#################################################################################################################</div><div><br></div><div><br></div><div>CTM_SRV && CTM_APACHE are in stopped state. These services are not being executed by cluster OR it is being failed somehow by cluster, not sure why? When I manually execute CTM_SRV script, the script gets executed without issues.</div><div><br></div><div>-> For manually execution of this script I ran the below command:</div><div><br></div><div># /cisco/PrimeOpticalServer/HA/bin/OracleAgent.py status<br></div><div><br></div><div>Output:</div><div><br></div><div>_________________________________________________________________________________________________________________</div><div><div>2016-02-08 17:48:41,888 INFO MainThread CtmAgent </div><div>=========================================================</div><div>Executing preliminary checks...</div><div> Check Oracle and Listener availability</div><div> => Oracle and listener are up.</div><div> Migration check</div><div> => Migration check completed successfully.</div><div> Check the status of the DB archivelog</div><div> => DB archivelog check completed successfully.</div><div> Check of Oracle scheduler... </div><div> => Check of Oracle scheduler completed successfully </div><div> Initializing database tables</div><div> => Database tables initialized successfully. </div><div> Install in cache the store procedure</div><div> => Installing store procedures completed successfully </div><div> Gather the oracle system stats</div><div> => Oracle stats completed successfully </div><div>Preliminary checks completed.</div><div>=========================================================</div><div>Starting base services...</div><div>Starting Zookeeper...</div><div>JMX enabled by default</div><div>Using config: /opt/CiscoTransportManagerServer/zookeeper/bin/../conf/zoo.cfg</div><div>Starting zookeeper ... STARTED</div><div> Retrieving name service port...</div><div> Starting name service...</div><div>Base services started.</div><div>=========================================================</div><div>Starting Prime Optical services...</div><div>Prime Optical services started.</div><div>=========================================================</div><div>Cisco Prime Optical Server Version: 10.5.0.0.214 / Oracle Embedded</div><div>-------------------------------------------------------------------------------------</div><div> USER PID %CPU %MEM START TIME PROCESS</div><div>-------------------------------------------------------------------------------------</div><div> root 16282 0.0 0.0 17:48:11 0:00 CTM Server</div><div> root 16308 0.0 0.1 17:48:16 0:00 CTM Server</div><div> root 16172 0.1 0.1 17:48:10 0:00 NameService</div><div> root 16701 24.8 7.5 17:48:27 0:27 TOMCAT</div><div> root 16104 0.2 0.2 17:48:09 0:00 Zookeeper</div><div>-------------------------------------------------------------------------------------</div><div>For startup details, see: /opt/CiscoTransportManagerServer/log/ctms-start.log</div><div>2016-02-08 17:48:41,888 WARNING MainThread CtmAgent CTM restartd at attempt 1</div></div><div>_________________________________________________________________________________________________________________</div><div><br></div><div><br></div><div>The script gets executed and I could see that service was started but crm_mon output still shows this CTM_SRV script in stopped state, why?</div><div><br></div><div><br></div><div><br></div><div>-> When I try to start the script through pcs commad, I get the below errors in logs. I tried to debug, but couldn't manage to rectify. I'd really appreciate if any help can be provided in order to get this resolved.</div><div><br></div><div># pcs resource enable CTM_SRV</div><div><br></div><div><br></div><div>Output:</div><div>_________________________________________________________________________________________________________________</div><div><br></div><div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_print: CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): FAILED <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: get_failcount_full: CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: common_apply_stickiness: Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Could not allocate a node for CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Processing CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_color: Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: notice: LogActions: Stop CTM_SRV (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: notice: te_rsc_command: Initiating action 7: stop CTM_SRV_stop_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (local)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: debug: do_lrm_rsc_op: Stopped 0 recurring operations in preparation for CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: info: do_lrm_rsc_op: Performing key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112 op=CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> lrmd: info: log_execute: executing - rsc:CTM_SRV action:stop call_id:578</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> lrmd: debug: operation_finished: CTM_SRV_stop_0:498 - exited with rc=0</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> lrmd: debug: operation_finished: CTM_SRV_stop_0:498:stderr [ -- empty -- ]</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> lrmd: debug: operation_finished: CTM_SRV_stop_0:498:stdout [ 0 ]</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> lrmd: info: log_finished: finished - rsc:CTM_SRV action:stop call_id:578 pid:498 exit-code:0 exec-time:142ms queue-time:0ms</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: debug: create_operation_update: do_update_resource: Updating resource CTM_SRV after stop op complete (interval=0)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: notice: process_lrm_event: Operation CTM_SRV_stop_0: ok (node=<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>, call=578, rc=0, cib-update=901, confirmed=true)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: debug: process_lrm_event: ha1-103.cisco.com-CTM_SRV_stop_0:578 [ 0\n ]</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: debug: update_history_cache: Updating history for 'CTM_SRV' with stop op</div><div>Feb 08 17:12:42 [12873] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='CTM_SRV']/lrm_rsc_op[@id='CTM_SRV_last_0']: @operation_key=CTM_</div><div>SRV_stop_0, @operation=stop, @transition-key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112, @transition-magic=0:0;7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112, @call-id=578, @rc-code=0, @last-run=1454969562, @last-rc-change=1454969562, @exec-time=142</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> crmd: info: match_graph_event: Action CTM_SRV_stop_0 (7) confirmed on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (rc=0)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_print: CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): Stopped</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: get_failcount_full: CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: common_apply_stickiness: Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Could not allocate a node for CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_color: Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: LogActions: Leave CTM_SRV (Stopped)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_print: CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): Stopped</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: get_failcount_full: CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: common_apply_stickiness: Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_color: Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: LogActions: Leave CTM_SRV (Stopped)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_print: CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): Stopped</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: get_failcount_full: CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: common_apply_stickiness: Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_color: Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: LogActions: Leave CTM_SRV (Stopped)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: determine_op_status: CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: unpack_rsc_op_failure: Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_print: CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): Stopped</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: get_failcount_full: CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: warning: common_apply_stickiness: Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: rsc_merge_weights: CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: debug: native_assign_node: Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: native_color: Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> pengine: info: LogActions: Leave CTM_SRV (Stopped)</div></div><div><br></div><div>________________________________________________________________________________________________________________________</div><div><br></div><div><br></div><div>Thanks</div><div>Jaspal</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
------------------------------<br>
<br>
Message: 3<br>
Date: Sat, 30 Jan 2016 03:48:03 +0100<br>
From: Jan Pokorn? <<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>><br>
To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>
Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to<br>
Pacemaker<br>
Message-ID: <<a href="mailto:20160130024803.GA27849@redhat.com">20160130024803.GA27849@redhat.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
On 27/01/16 19:41 +0100, Jan Pokorn? wrote:<br>
> On 27/01/16 11:04 -0600, Ken Gaillot wrote:<br>
>> On 01/27/2016 02:34 AM, jaspal singla wrote:<br>
>>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter disables<br>
>>> the start of all services when RGManager starts). Is there any way for such<br>
>>> behavior in Pacemaker?<br>
><br>
> Please be more careful about the descriptions; autostart=0 specified<br>
> at the given resource group ("service" or "vm" tag) means just not to<br>
> start anything contained in this very one automatically (also upon<br>
> new resources being defined, IIUIC), definitely not "all services".<br>
><br>
> [...]<br>
><br>
>> I don't think there's any exact replacement for autostart in pacemaker.<br>
>> Probably the closest is to set target-role=Stopped before stopping the<br>
>> cluster, and set target-role=Started when services are desired to be<br>
>> started.<br>
<br>
Beside is-managed=false (as currently used in clufter), I also looked<br>
at downright disabling "start" action, but this turned out to be a naive<br>
approach caused by unclear documentation.<br>
<br>
Pushing for a bit more clarity (hopefully):<br>
<a href="https://github.com/ClusterLabs/pacemaker/pull/905" rel="noreferrer" target="_blank">https://github.com/ClusterLabs/pacemaker/pull/905</a><br>
<br>
>>> 2) Please put some alternatives to exclusive=0 and __independent_subtree?<br>
>>> what we have in Pacemaker instead of these?<br>
<br>
(exclusive property discussed in the other subthread; as a recap,<br>
no extra effort is needed to achieve exclusive=0, exclusive=1 is<br>
currently a show stopper in clufter as neither approach is versatile<br>
enough)<br>
<br>
> For __independent_subtree, each component must be a separate pacemaker<br>
> resource, and the constraints between them would depend on exactly what<br>
> you were trying to accomplish. The key concepts here are ordering<br>
> constraints, colocation constraints, kind=Mandatory/Optional (for<br>
> ordering constraints), and ordered sets.<br>
<br>
Current approach in clufter as of the next branch:<br>
- __independent_subtree=1 -> do nothing special (hardly can be<br>
improved?)<br>
- __independent_subtree=2 -> for that very resource, set operations<br>
as follows:<br>
monitor (interval=60s) on-fail=ignore<br>
stop interval=0 on-fail=stop<br>
<br>
Groups carrying such resources are not unrolled into primitives plus<br>
contraints, as the above might suggest (also default kind=Mandatory<br>
for underlying order constraints should fit well).<br>
<br>
Please holler if this is not sound.<br>
<br>
<br>
So when put together with some other changes/fixes, current<br>
suggested/informative sequence of pcs commands goes like this:<br>
<br>
pcs cluster auth <a href="http://ha1-105.test.com" rel="noreferrer" target="_blank">ha1-105.test.com</a><br>
pcs cluster setup --start --name HA1-105_CLUSTER <a href="http://ha1-105.test.com" rel="noreferrer" target="_blank">ha1-105.test.com</a> \<br>
--consensus 12000 --token 10000 --join 60<br>
sleep 60<br>
pcs cluster cib tmp-cib.xml --config<br>
pcs -f tmp-cib.xml property set stonith-enabled=false<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-FSCheck \<br>
lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-NTW_IF \<br>
lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-CTM_RSYNC \<br>
lsb:../../..//data/Product/HA/bin/RsyncAgent.py \<br>
op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-REPL_IF \<br>
lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \<br>
op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-ORACLE_REPLICATOR \<br>
lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \<br>
op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-CTM_SID \<br>
lsb:../../..//data/Product/HA/bin/OracleAgent.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-CTM_SRV \<br>
lsb:../../..//data/Product/HA/bin/CtmAgent.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-CTM_APACHE \<br>
lsb:../../..//data/Product/HA/bin/ApacheAgent.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-CTM_HEARTBEAT \<br>
lsb:../../..//data/Product/HA/bin/HeartBeat.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource create RESOURCE-script-FLASHBACK \<br>
lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \<br>
op monitor interval=30s<br>
pcs -f tmp-cib.xml \<br>
resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \<br>
RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \<br>
RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \<br>
RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \<br>
RESOURCE-script-CTM_APACHE<br>
pcs -f tmp-cib.xml resource \<br>
meta SERVICE-ctm_service-GROUP is-managed=false<br>
pcs -f tmp-cib.xml \<br>
resource group add SERVICE-ctm_heartbeat-GROUP \<br>
RESOURCE-script-CTM_HEARTBEAT<br>
pcs -f tmp-cib.xml resource \<br>
meta SERVICE-ctm_heartbeat-GROUP migration-threshold=3 \<br>
failure-timeout=900<br>
pcs -f tmp-cib.xml \<br>
resource group add SERVICE-ctm_monitoring-GROUP \<br>
RESOURCE-script-FLASHBACK<br>
pcs -f tmp-cib.xml resource \<br>
meta SERVICE-ctm_monitoring-GROUP migration-threshold=3 \<br>
failure-timeout=900<br>
pcs cluster cib-push tmp-cib.xml --config<br>
<br>
<br>
Any suggestions welcome...<br>
<br>
--<br>
Jan (Poki)<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: application/pgp-signature<br>
Size: 819 bytes<br>
Desc: not available<br>
URL: <<a href="http://clusterlabs.org/pipermail/users/attachments/20160130/5eb89fbd/attachment-0001.sig" rel="noreferrer" target="_blank">http://clusterlabs.org/pipermail/users/attachments/20160130/5eb89fbd/attachment-0001.sig</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
<br>
End of Users Digest, Vol 12, Issue 48<br>
*************************************<br>
</blockquote></div><br></div></div>