<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On 09/22/2016 09:53 AM, Jan Pokorný wrote:<br>

> On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:<br>

>> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> writes:<br>

>><br>

>>> I'm not saying it's a bad idea, just that it's more complicated than it<br>

>>> first sounds, so it's worth thinking through the implications.<br>

>><br>

>> Thinking about it and looking at how complicated it gets, maybe what<br>

>> you'd really want, to make it clearer for the user, is the ability to<br>

>> explicitly configure the behavior, either globally or per-resource. So<br>

>> instead of having to tweak a set of variables that interact in complex<br>

>> ways, you'd configure something like rule expressions,<br>

>><br>

>> <on_fail><br>

>>   <restart repeat="3" /><br>

>>   <migrate timeout="60s" /><br>

>>   <fence/><br>

>> </on_fail><br>

>><br>

>> So, try to restart the service 3 times, if that fails migrate the<br>

>> service, if it still fails, fence the node.<br>

>><br>

>> (obviously the details and XML syntax are just an example)<br>

>><br>

>> This would then replace on-fail, migration-threshold, etc.<br>

><br>

> I must admit that in previous emails in this thread, I wasn't able to<br>

> follow during the first pass, which is not the case with this procedural<br>

> (sequence-ordered) approach.  Though someone can argue it doesn't take<br>

> type of operation into account, which might again open the door for<br>

> non-obvious interactions.<br>

<br>

</span>"restart" is the only on-fail value that it makes sense to escalate.<br>

<br>

block/stop/fence/standby are final. Block means "don't touch the<br>

resource again", so there can't be any further response to failures.<br>

Stop/fence/standby move the resource off the local node, so failure<br>

handling is reset (there are 0 failures on the new node to begin with).<br>

<br>

"Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures<br>

then migrate", but I can't think of a real-world situation where that<br>

makes sense, </blockquote><div><br></div><div>really?</div><div><br></div><div>it is not uncommon to hear "i know its failed, but i dont want the cluster to do anything until its _really_ failed"  </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">and it would be a significant re-implementation of "ignore"<br>

(which currently ignores the state of having failed, as opposed to a<br>

particular instance of failure).<br></blockquote><div><br></div><div>agreed</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

What the interface needs to express is: "If this operation fails,<br>

optionally try a soft recovery [always stop+start], but if <N> failures<br>

occur on the same node, proceed to a [configurable] hard recovery".<br>

<br>

And of course the interface will need to be different depending on how<br>

certain details are decided, e.g. whether any failures count toward <N><br>

or just failures of one particular operation type, and whether the hard<br>

recovery type can vary depending on what operation failed.<br>

<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

______________________________<wbr>_________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br></div></div>