[Pacemaker] Moving Resources Due to Failure

Arnold Krille arnold at arnoldarts.de
Sat Apr 14 05:57:28 EDT 2012


On Saturday 14 April 2012 13:24:29 S, MOHAMED ** CTR ** wrote:
> The Pacemaker_Explained.pdf document says that
> " setting of migration-threshold=2 and failure-timeout=60s would cause the
> resource to move to a new node after 2 failures, and allow it to move back
> (depending on the stickiness and constraint scores) after one minute."
> 
> Can you please help me understand what will happen on the following
> scenarios in 2 node active passive configuration?
> 1 - If one resource failed twice within 60s, it will move to the other node.
> This is clear to understand.

Yep.

> 2 - If one resource failed once and there is no failure within 60s, will the
> pacemaker reset the failcounts of that resource, so that the failcounts are
> tracked freshly? Will the failcounts gets reset if the migration-threshold
> didn't occur within the failure-timeout period?

The error-count will get set to zero after the failure-timeout. So in your 
example, the resource can again fail without moving once 60 seconds have 
passed since the last fail.
Note that a fail means the monitor-action didn't finish or didn't return 
"OCF_RUNNING" when it was supposed to do so. The cluster then stops the 
resource, increments the failure-counter and then starts the resource again, 
on the same node if possible, or on a different node.
When that failing resource is in a group, all the depending resources in that 
group will be stopped and restarted too.
When the failed resource fails to execute the stop-action, this is a big fault 
crying for fencing of that whole node to get the resource back into a sane and 
known state.
When the resource fails to start, that counts as 10000 failures (almost 
infinitely) and prevents the resource from starting on that node until you as 
admin clean it up. Or the whole node is fenced due to some other 
circumstance...

Have fun,

Arnold
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120414/4fc9ac94/attachment-0003.sig>


More information about the Pacemaker mailing list