Andrew,<br><br>The updated description looks nice, but could you please remove my <a href="http://fbsdata.com">fbsdata.com</a> domain name from the man page?  Also, the "memcached" OCF script was my own creation, and might not be a good example.  Maybe one of the other commonly used example resources like an IP address or mysql or something?<br>


<br>If you guys would like to include my memcache my OCF script in the pacemaker distribution just let me know, I'll clean it up for public use and email it.<br><br>Thanks again!<br><br>--Cal<br><br><br><div class="gmail_quote">


On Wed, Oct 24, 2012 at 7:32 PM, Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


How is this?<br>

<br>

...<br>

<br>

       -i, --op-inject=value<br>

              Generate a failure for the cluster to react to in the simulation<br>

<br>

              Value is of the form<br>

${resource}_${task}_${interval}@${node}=${rc}.  Eg.<br>

<a href="mailto:memcached_monitor_20000@m1.fbsdata.com">memcached_monitor_20000@m1.fbsdata.com</a>=7<br>

<br>

       -F, --op-fail=value<br>

              If the specified task occurs during the simulation, have<br>

it fail with return code ${rc}<br>

<br>

              Value is of the form<br>

${resource}_${task}_${interval}@${node}=${rc}.  Eg.<br>

<a href="mailto:memcached_stop_0@m1.fbsdata.com">memcached_stop_0@m1.fbsdata.com</a>=1<br>

<br>

              The transition will normally stop at the failed action,<br>

save the result with --save-output and re-run crm_simulate with<br>

--xml-file<br>

<br>

...<br>

<br>

EXAMPLES<br>

       Pretend the recurring memcached monitor failed on node<br>

<a href="http://m1.fbsdata.com" target="_blank">m1.fbsdata.com</a> and, during recovery, that the memcached stop action<br>

did too<br>

<br>

              # crm_simulate -LS --op-inject<br>

<a href="mailto:memcached%3A0_monitor_20000@m1.fbsdata.com">memcached:0_monitor_20000@m1.fbsdata.com</a>=7 --op-fail<br>

<a href="mailto:memcached%3A0_stop_0@m1.fbsdata.com">memcached:0_stop_0@m1.fbsdata.com</a>=1 --save-output<br>

/tmp/memcached-test.xml<br>

<br>

       Now see what the reaction to the stop failure would be<br>

<br>

              # crm_simulate -S --xml-file /tmp/memcached-test.xml<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

On Thu, Oct 25, 2012 at 8:43 AM, Andrew Beekhof <<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>> wrote:<br>

> On Thu, Oct 25, 2012 at 1:37 AM, Cal Heldenbrand <<a href="mailto:cal@fbsdata.com">cal@fbsdata.com</a>> wrote:<br>

>> Thanks Andrew!  My first few attempts at playing around with the failure<br>

>> states are working as expected.<br>

>><br>

>> A few follow-ups below:<br>

>><br>

>><br>

>>> --op-fail isn't the command you want though.<br>

>>> From the man page:<br>

>>><br>

>>>        -i, --op-inject=value<br>

>>>               $rsc_$task_$interval@$node=$rc - Inject the specified<br>

>>> task before running the simulation<br>

>>><br>

>>>        -F, --op-fail=value<br>

>>>               $rsc_$task_$interval@$node=$rc - Fail the specified task<br>

>>> while running the simulation<br>

>>><br>

>>> Note the difference between the two descriptions: before vs. while.<br>

>>> --op-inject is the one you want.  It is mostly useful for pretending a<br>

>>> recurring monitor failed and seeing what the cluster would do about<br>

>>> it.<br>

>>><br>

>>> --op-fail on the other hand, is used for pretending that part of the<br>

>>> recovery process failed.<br>

>><br>

>><br>

>> Your follow up description here is great, and makes more sense.  I was<br>

>> reading "Fail the specified task" as literally, "here's my task, fail it and<br>

>> show me the results"  I'd suggest to add a little paragraph in the man page<br>

>> to elaborate these points too.<br>

><br>

> Ok, I'll add that today.<br>

><br>

>> Also, can you tell me what all of the return<br>

>> codes are?  Do I have to use integers, or do strings like "error" work?<br>

><br>

> Just integers I'm afraid.<br>

> The full list for OCF agents is here:<br>

> <a href="http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-ocf-return-codes.html" target="_blank">http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-ocf-return-codes.html</a><br>


> LSB return codes are slightly different.<br>

><br>

>> While we're on the subject of documentation / usability, I would also<br>

>> suggest to split out these two features into more parameters.  (What would<br>

>> happen if I named my resource with an underscore?)  Maybe something like:<br>

>><br>

>> --op-pre-resource=[primitive name]<br>

>> --op-pre-task=[monitor|start|stop]<br>

>> --op-pre-interval=[integer]<br>

>> --op-pre-node=[hostname]<br>

>> --op-pre-rc=[error|timeout|other stuff]<br>

>><br>

>> Then have similar --op-post-* parameters.  Or whatever verbs make the most<br>

>> sense in the spirit of Pacemaker vocabulary.  (pre/post, before/after,<br>

>> inject/fail, input/output, etc)<br>

><br>

> The reason for not doing that, is that we wanted to be able to inject<br>

> multiple pre/post failures at a time and see the result.<br>

><br>

>> And, examples are always awesome in man<br>

>> pages too.<br>

>><br>

>> Of course, this is all great future version stuff, but that doesn't help all<br>

>> of the RedHat 6 people that will be using pacemaker 1.1 packages for the<br>

>> next ~10 years until RedHat 7 comes out.<br>

><br>

> Don;t worry, the man page updates we just talked about will be in the<br>

> 6.4 packages :)<br>

><br>

>> So I suppose documenting the old<br>

>> code in the online docs is a Good Thing.  :-)<br>

>><br>

>> Thanks again!<br>

>><br>

>> --Cal<br>

>><br>

>><br>

>><br>

>> _______________________________________________<br>

>> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

>><br>

>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

>><br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br>