[ClusterLabs] Gracefully delaying (cloned) resource startup
bubble at hoster-ok.com
Wed Feb 25 03:34:56 EST 2015
25.02.2015 10:20, Andrei Borzenkov wrote:
> Consider replicated resource that is represented as master/slave. When
> local RA starts and finds local resource in "primary" state it cannot
> automatically assume resource should be master - it is possible to
> have both ends in "primary" state after failover (e.g. after node
> failure). Consider scenario:
> - node A runs primary (master)
> - node A fails over to node B
> - both nodes have to be switched off (power outage, maintenance work, ...)
> - after switching on only node A comes up for whatever reason
> At this point local resource on node A is still in "primary" state,
> but with stale content. So we need to wait until node B is actually
> available to check state of resource on node B before we can take any
> action. One possible action is to freeze until manual administrator
> intervention ...
> I could not find how to implement it in pacemaker. What we can do is
> 1) pretend resource is started (by going to "slave") and actually
> initiate resource startup in monitor script later.
> 2) fail startup request
> The former means reduced visibility - from user point of view resource
> is started while it actually is not. The latter means that at some
> point we exceed failure threshold and it will need manual
> administrator intervention.
> What I'd actually like is the ability to say "delay startup until all
> nodes are available" with some option to manually "force master" if
> necessary. May be I miss something obvious here but I could not find
> how it can be done.
> Thank you for any hints!
May be wait-for-all option in corosync2 votequorum plugin and
no-quorum-policy=freeze may help?
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users