<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Hi Jan/Digiman,</div><div><br></div><div>Thanks for your replies. Based on your inputs, I managed to configure these values and results were fine but still have some doubts for which I would seek your help. I also tried to dig some of issues on internet but seems due to lack of cman -> pacemaker documentation, I couldn't find any.</div><div><br></div><div>I have configured 8 scripts under one resource as you recommended. But out of which 2 scripts are not being executed by cluster by cluster itself. When I tried to execute the same script manually, I am able to do it but through pacemaker command I don't.</div><div><br></div><div>For example:</div><div><br></div><div>This is the output of crm_mon command:</div><div><br></div><div>###############################################################################################################</div><div><div>Last updated: Mon Feb  8 17:30:57 2016          Last change: Mon Feb  8 17:03:29 2016 by hacluster via crmd on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Stack: corosync</div><div>Current DC: <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (version 1.1.13-10.el7-44eb2dd) - partition with quorum</div><div>1 node and 10 resources configured</div><div><br></div><div>Online: [ <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> ]</div><div><br></div><div> Resource Group: ctm_service</div><div>     FSCheck    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py):        Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     NTW_IF     (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py):  Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     CTM_RSYNC  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py):  Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     REPL_IF    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     ORACLE_REPLICATOR  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     CTM_SID    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>     CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped</div><div>     CTM_APACHE (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): Stopped</div><div> Resource Group: ctm_heartbeat</div><div>     CTM_HEARTBEAT      (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py):   Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div> Resource Group: ctm_monitoring</div><div>     FLASHBACK  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py):    Started <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div><br></div><div>Failed Actions:</div><div>* CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> 'unknown error' (1): call=577, status=complete, exitreason='none',</div><div>    last-rc-change='Mon Feb  8 17:12:33 2016', queued=0ms, exec=74ms</div></div><div><br></div><div>#################################################################################################################</div><div><br></div><div><br></div><div>CTM_SRV && CTM_APACHE are in stopped state. These services are not being executed by cluster OR it is being failed somehow by cluster, not sure why?  When I manually execute CTM_SRV script, the script gets executed without issues.</div><div><br></div><div>-> For manually execution of this script I ran the below command:</div><div><br></div><div># /cisco/PrimeOpticalServer/HA/bin/OracleAgent.py status<br></div><div><br></div><div>Output:</div><div><br></div><div>_________________________________________________________________________________________________________________</div><div><div>2016-02-08 17:48:41,888 INFO MainThread CtmAgent </div><div>=========================================================</div><div>Executing preliminary checks...</div><div> Check Oracle and Listener availability</div><div>  => Oracle and listener are up.</div><div> Migration check</div><div>  => Migration check completed successfully.</div><div> Check the status of the DB archivelog</div><div>  => DB archivelog check completed successfully.</div><div> Check of Oracle scheduler... </div><div>  => Check of Oracle scheduler completed successfully </div><div> Initializing database tables</div><div>  => Database tables initialized successfully. </div><div> Install in cache the store procedure</div><div>  => Installing store procedures completed successfully </div><div> Gather the oracle system stats</div><div>  => Oracle stats completed successfully </div><div>Preliminary checks completed.</div><div>=========================================================</div><div>Starting base services...</div><div>Starting Zookeeper...</div><div>JMX enabled by default</div><div>Using config: /opt/CiscoTransportManagerServer/zookeeper/bin/../conf/zoo.cfg</div><div>Starting zookeeper ... STARTED</div><div> Retrieving name service port...</div><div> Starting name service...</div><div>Base services started.</div><div>=========================================================</div><div>Starting Prime Optical services...</div><div>Prime Optical services started.</div><div>=========================================================</div><div>Cisco Prime Optical Server Version: 10.5.0.0.214 / Oracle Embedded</div><div>-------------------------------------------------------------------------------------</div><div>      USER       PID      %CPU      %MEM     START      TIME   PROCESS</div><div>-------------------------------------------------------------------------------------</div><div>      root     16282       0.0       0.0  17:48:11      0:00   CTM Server</div><div>      root     16308       0.0       0.1  17:48:16      0:00   CTM Server</div><div>      root     16172       0.1       0.1  17:48:10      0:00   NameService</div><div>      root     16701      24.8       7.5  17:48:27      0:27   TOMCAT</div><div>      root     16104       0.2       0.2  17:48:09      0:00   Zookeeper</div><div>-------------------------------------------------------------------------------------</div><div>For startup details, see: /opt/CiscoTransportManagerServer/log/ctms-start.log</div><div>2016-02-08 17:48:41,888 WARNING MainThread CtmAgent CTM restartd at attempt 1</div></div><div>_________________________________________________________________________________________________________________</div><div><br></div><div><br></div><div>The script gets executed and I could see that service was started but crm_mon output still shows this CTM_SRV script in stopped state, why?</div><div><br></div><div><br></div><div><br></div><div>-> When I try to start the script through pcs commad, I get the below errors in logs. I tried to debug, but couldn't manage to rectify. I'd really appreciate if any help can be provided in order to get this resolved.</div><div><br></div><div># pcs resource enable CTM_SRV</div><div><br></div><div><br></div><div>Output:</div><div>_________________________________________________________________________________________________________________</div><div><br></div><div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_print:        CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    FAILED <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: get_failcount_full:     CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: common_apply_stickiness:        Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Could not allocate a node for CTM_SRV</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Processing CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_color:   Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:12:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:   notice: LogActions:     Stop    CTM_SRV (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:   notice: te_rsc_command: Initiating action 7: stop CTM_SRV_stop_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (local)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:    debug: do_lrm_rsc_op:  Stopped 0 recurring operations in preparation for CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:     info: do_lrm_rsc_op:  Performing key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112 op=CTM_SRV_stop_0</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       lrmd:     info: log_execute:    executing - rsc:CTM_SRV action:stop call_id:578</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       lrmd:    debug: operation_finished:     CTM_SRV_stop_0:498 - exited with rc=0</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       lrmd:    debug: operation_finished:     CTM_SRV_stop_0:498:stderr [ -- empty -- ]</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       lrmd:    debug: operation_finished:     CTM_SRV_stop_0:498:stdout [ 0 ]</div><div>Feb 08 17:12:42 [12875] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       lrmd:     info: log_finished:   finished - rsc:CTM_SRV action:stop call_id:578 pid:498 exit-code:0 exec-time:142ms queue-time:0ms</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:    debug: create_operation_update:        do_update_resource: Updating resource CTM_SRV after stop op complete (interval=0)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:   notice: process_lrm_event:      Operation CTM_SRV_stop_0: ok (node=<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>, call=578, rc=0, cib-update=901, confirmed=true)</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:    debug: process_lrm_event:      ha1-103.cisco.com-CTM_SRV_stop_0:578 [ 0\n ]</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:    debug: update_history_cache:   Updating history for 'CTM_SRV' with stop op</div><div>Feb 08 17:12:42 [12873] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>        cib:     info: cib_perform_op: +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='CTM_SRV']/lrm_rsc_op[@id='CTM_SRV_last_0']:  @operation_key=CTM_</div><div>SRV_stop_0, @operation=stop, @transition-key=7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112, @transition-magic=0:0;7:177:0:c1f19bee-9119-48fa-9ebd-6ffeaf24e112, @call-id=578, @rc-code=0, @last-run=1454969562, @last-rc-change=1454969562, @exec-time=142</div><div>Feb 08 17:12:42 [12878] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>       crmd:     info: match_graph_event:      Action CTM_SRV_stop_0 (7) confirmed on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> (rc=0)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_print:        CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: get_failcount_full:     CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: common_apply_stickiness:        Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Could not allocate a node for CTM_SRV</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_color:   Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:27:42 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: LogActions:     Leave   CTM_SRV (Stopped)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_print:        CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: get_failcount_full:     CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: common_apply_stickiness:        Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_color:   Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: LogActions:     Leave   CTM_SRV (Stopped)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_print:        CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: get_failcount_full:     CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: common_apply_stickiness:        Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      FSCheck: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      NTW_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_RSYNC: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      REPL_IF: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      ORACLE_REPLICATOR: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_color:   Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:00 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: LogActions:     Leave   CTM_SRV (Stopped)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: determine_op_status:    CTM_SRV_start_0 on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> returned 'unknown error' (1) instead of the expected value: 'ok' (0)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: unpack_rsc_op_failure:  Processing failed op start for CTM_SRV on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: unknown error (1)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_print:        CTM_SRV    (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):    Stopped</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: get_failcount_full:     CTM_SRV has failed INFINITY times on <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a></div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:  warning: common_apply_stickiness:        Forcing CTM_SRV away from <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a> after 1000000 failures (max=1000000)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SID: Rolling back scores from CTM_SRV</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: rsc_merge_weights:      CTM_SRV: Rolling back scores from CTM_APACHE</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     All nodes for resource CTM_SRV are unavailable, unclean or shutting down (<a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>: 1, -1000000)</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:    debug: native_assign_node:     Could not allocate a node for CTM_SRV</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: native_color:   Resource CTM_SRV cannot run anywhere</div><div>Feb 08 17:38:20 [12877] <a href="http://ha1-103.cisco.com">ha1-103.cisco.com</a>    pengine:     info: LogActions:     Leave   CTM_SRV (Stopped)</div></div><div><br></div><div>________________________________________________________________________________________________________________________</div><div><br></div><div><br></div><div>Thanks</div><div>Jaspal</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

------------------------------<br>

<br>

Message: 3<br>

Date: Sat, 30 Jan 2016 03:48:03 +0100<br>

From: Jan Pokorn? <<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>><br>

To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to<br>

        Pacemaker<br>

Message-ID: <<a href="mailto:20160130024803.GA27849@redhat.com">20160130024803.GA27849@redhat.com</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

On 27/01/16 19:41 +0100, Jan Pokorn? wrote:<br>

> On 27/01/16 11:04 -0600, Ken Gaillot wrote:<br>

>> On 01/27/2016 02:34 AM, jaspal singla wrote:<br>

>>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter disables<br>

>>> the start of all services when RGManager starts). Is there any way for such<br>

>>> behavior in Pacemaker?<br>

><br>

> Please be more careful about the descriptions; autostart=0 specified<br>

> at the given resource group ("service" or "vm" tag) means just not to<br>

> start anything contained in this very one automatically (also upon<br>

> new resources being defined, IIUIC), definitely not "all services".<br>

><br>

> [...]<br>

><br>

>> I don't think there's any exact replacement for autostart in pacemaker.<br>

>> Probably the closest is to set target-role=Stopped before stopping the<br>

>> cluster, and set target-role=Started when services are desired to be<br>

>> started.<br>

<br>

Beside is-managed=false (as currently used in clufter), I also looked<br>

at downright disabling "start" action, but this turned out to be a naive<br>

approach caused by unclear documentation.<br>

<br>

Pushing for a bit more clarity (hopefully):<br>

<a href="https://github.com/ClusterLabs/pacemaker/pull/905" rel="noreferrer" target="_blank">https://github.com/ClusterLabs/pacemaker/pull/905</a><br>

<br>

>>> 2) Please put some alternatives to exclusive=0 and __independent_subtree?<br>

>>> what we have in Pacemaker instead of these?<br>

<br>

(exclusive property discussed in the other subthread; as a recap,<br>

no extra effort is needed to achieve exclusive=0, exclusive=1 is<br>

currently a show stopper in clufter as neither approach is versatile<br>

enough)<br>

<br>

> For __independent_subtree, each component must be a separate pacemaker<br>

> resource, and the constraints between them would depend on exactly what<br>

> you were trying to accomplish. The key concepts here are ordering<br>

> constraints, colocation constraints, kind=Mandatory/Optional (for<br>

> ordering constraints), and ordered sets.<br>

<br>

Current approach in clufter as of the next branch:<br>

- __independent_subtree=1 -> do nothing special (hardly can be<br>

                             improved?)<br>

- __independent_subtree=2 -> for that very resource, set operations<br>

                             as follows:<br>

                             monitor (interval=60s) on-fail=ignore<br>

                             stop interval=0 on-fail=stop<br>

<br>

Groups carrying such resources are not unrolled into primitives plus<br>

contraints, as the above might suggest (also default kind=Mandatory<br>

for underlying order constraints should fit well).<br>

<br>

Please holler if this is not sound.<br>

<br>

<br>

So when put together with some other changes/fixes, current<br>

suggested/informative sequence of pcs commands goes like this:<br>

<br>

pcs cluster auth <a href="http://ha1-105.test.com" rel="noreferrer" target="_blank">ha1-105.test.com</a><br>

pcs cluster setup --start --name HA1-105_CLUSTER <a href="http://ha1-105.test.com" rel="noreferrer" target="_blank">ha1-105.test.com</a> \<br>

  --consensus 12000 --token 10000 --join 60<br>

sleep 60<br>

pcs cluster cib tmp-cib.xml --config<br>

pcs -f tmp-cib.xml property set stonith-enabled=false<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-FSCheck \<br>

  lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-NTW_IF \<br>

  lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-CTM_RSYNC \<br>

  lsb:../../..//data/Product/HA/bin/RsyncAgent.py \<br>

  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-REPL_IF \<br>

  lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \<br>

  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-ORACLE_REPLICATOR \<br>

  lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \<br>

  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-CTM_SID \<br>

  lsb:../../..//data/Product/HA/bin/OracleAgent.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-CTM_SRV \<br>

  lsb:../../..//data/Product/HA/bin/CtmAgent.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-CTM_APACHE \<br>

  lsb:../../..//data/Product/HA/bin/ApacheAgent.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-CTM_HEARTBEAT \<br>

  lsb:../../..//data/Product/HA/bin/HeartBeat.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource create RESOURCE-script-FLASHBACK \<br>

  lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \<br>

  op monitor interval=30s<br>

pcs -f tmp-cib.xml \<br>

  resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \<br>

  RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \<br>

  RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \<br>

  RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \<br>

  RESOURCE-script-CTM_APACHE<br>

pcs -f tmp-cib.xml resource \<br>

  meta SERVICE-ctm_service-GROUP is-managed=false<br>

pcs -f tmp-cib.xml \<br>

  resource group add SERVICE-ctm_heartbeat-GROUP \<br>

  RESOURCE-script-CTM_HEARTBEAT<br>

pcs -f tmp-cib.xml resource \<br>

  meta SERVICE-ctm_heartbeat-GROUP migration-threshold=3 \<br>

  failure-timeout=900<br>

pcs -f tmp-cib.xml \<br>

  resource group add SERVICE-ctm_monitoring-GROUP \<br>

  RESOURCE-script-FLASHBACK<br>

pcs -f tmp-cib.xml resource \<br>

  meta SERVICE-ctm_monitoring-GROUP migration-threshold=3 \<br>

  failure-timeout=900<br>

pcs cluster cib-push tmp-cib.xml --config<br>

<br>

<br>

Any suggestions welcome...<br>

<br>

--<br>

Jan (Poki)<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: application/pgp-signature<br>

Size: 819 bytes<br>

Desc: not available<br>

URL: <<a href="http://clusterlabs.org/pipermail/users/attachments/20160130/5eb89fbd/attachment-0001.sig" rel="noreferrer" target="_blank">http://clusterlabs.org/pipermail/users/attachments/20160130/5eb89fbd/attachment-0001.sig</a>><br>

<br>

------------------------------<br>

<br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

<br>

<br>

End of Users Digest, Vol 12, Issue 48<br>

*************************************<br>

</blockquote></div><br></div></div>