[ClusterLabs] Gracefully delaying (cloned) resource startup

Andrei Borzenkov arvidjaar at gmail.com
Wed Feb 25 02:20:35 EST 2015


Consider replicated resource that is represented as master/slave. When
local RA starts and finds local resource in "primary" state it cannot
automatically assume resource should be master -  it is possible to
have both ends in "primary" state after failover (e.g. after node
failure). Consider scenario:

- node A runs primary (master)
- node A fails over to node B
- both nodes have to be switched off (power outage, maintenance work, ...)
- after switching on only node A comes up for whatever reason

At this point local resource on node A is still in "primary" state,
but with stale content. So we need to wait until node B is actually
available to check state of resource on node B before we can take any
action. One possible action is to freeze until manual administrator
intervention ...

I could not find how to implement it in pacemaker. What we can do is

1) pretend resource is started (by going to "slave") and actually
initiate resource startup in monitor script later.

2) fail startup request

The former means reduced visibility - from user point of view resource
is started while it actually is not. The latter means that at some
point we exceed failure threshold and it will need manual
administrator intervention.

What I'd actually like is the ability to say "delay startup until all
nodes are available" with some option to manually "force master" if
necessary. May be I miss something obvious here but I could not find
how it can be done.

Thank you for any hints!

-andrei




More information about the Users mailing list