<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On 09/22/2016 09:53 AM, Jan Pokorný wrote:<br>
> On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:<br>
>> Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> writes:<br>
>><br>
>>> I'm not saying it's a bad idea, just that it's more complicated than it<br>
>>> first sounds, so it's worth thinking through the implications.<br>
>><br>
>> Thinking about it and looking at how complicated it gets, maybe what<br>
>> you'd really want, to make it clearer for the user, is the ability to<br>
>> explicitly configure the behavior, either globally or per-resource. So<br>
>> instead of having to tweak a set of variables that interact in complex<br>
>> ways, you'd configure something like rule expressions,<br>
>><br>
>> <on_fail><br>
>> <restart repeat="3" /><br>
>> <migrate timeout="60s" /><br>
>> <fence/><br>
>> </on_fail><br>
>><br>
>> So, try to restart the service 3 times, if that fails migrate the<br>
>> service, if it still fails, fence the node.<br>
>><br>
>> (obviously the details and XML syntax are just an example)<br>
>><br>
>> This would then replace on-fail, migration-threshold, etc.<br>
><br>
> I must admit that in previous emails in this thread, I wasn't able to<br>
> follow during the first pass, which is not the case with this procedural<br>
> (sequence-ordered) approach. Though someone can argue it doesn't take<br>
> type of operation into account, which might again open the door for<br>
> non-obvious interactions.<br>
<br>
</span>"restart" is the only on-fail value that it makes sense to escalate.<br>
<br>
block/stop/fence/standby are final. Block means "don't touch the<br>
resource again", so there can't be any further response to failures.<br>
Stop/fence/standby move the resource off the local node, so failure<br>
handling is reset (there are 0 failures on the new node to begin with).<br>
<br>
"Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures<br>
then migrate", but I can't think of a real-world situation where that<br>
makes sense, </blockquote><div><br></div><div>really?</div><div><br></div><div>it is not uncommon to hear "i know its failed, but i dont want the cluster to do anything until its _really_ failed" </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">and it would be a significant re-implementation of "ignore"<br>
(which currently ignores the state of having failed, as opposed to a<br>
particular instance of failure).<br></blockquote><div><br></div><div>agreed</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
What the interface needs to express is: "If this operation fails,<br>
optionally try a soft recovery [always stop+start], but if <N> failures<br>
occur on the same node, proceed to a [configurable] hard recovery".<br>
<br>
And of course the interface will need to be different depending on how<br>
certain details are decided, e.g. whether any failures count toward <N><br>
or just failures of one particular operation type, and whether the hard<br>
recovery type can vary depending on what operation failed.<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
______________________________<wbr>_________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/<wbr>mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div></div>