[Pacemaker] Possible bug with mandatory ordering involving stateful (i.e. master-slave) resources

Thu Oct 6 11:05:30 EDT 2011

Possible bug with mandatory ordering involving stateful (i.e.
master-slave) resources

I have a 2-node cluster (we are running the SLES 11 HA extension, so the
pacemaker version is 1.1.2) in which a master-slave resource is
dependent on a clone resource via a mandatory ordering constraint.  From
"crm configure show":

primitive dummy ocf:heartbeat:Dummy \

        op monitor interval="15s" \

        op start interval="0" timeout="40s" \

        op stop interval="0" timeout="60s"

primitive statefuldummy ocf:heartbeat:Stateful \

        op start timeout="1800s" \

        op timeout="45s" \

        op monitor interval="10s" timeout="60s" \

        op promote timeout="45s" \

        op demote timeout="30s"

ms dummy-ms statefuldummy \

        meta target-role="Started" master-max="1" master-node-max="1"
clone-max="2" clone-node-max="1" notify="false" ordered="false"
globally-unique="false" is-managed="true"

clone dummy-clone dummy \

        meta target-role="Started"

order dummy-order inf: dummy-clone dummy-ms

(I reproduced the problem we are experiencing with dummy resources to
try and eliminate the RAs for our real resources as the source of the
issue.)

The order of events is as follows:

1)     Force a shutdown of the dummy-clone via "crm resource stop
dummy-clone"

2)     Logs show that the crm stops both the master and slave
statefuldummy resources of the dummy-ms.  Good.

3)     Logs show that the crm stops the dummy-clone resources.  Good.

4)     Logs immediately show that the crm starts the master and slave
statefuldummy resources of the dummy-ms.  Bad.

5)     Logs show the crm stopping the statefuldumy resources again.
Good?

Has anyone seen something similar?  My understanding of the ordering
constraints tells me that event #4 is erroneous behaviour.  I would not
expect the statefuldummy resources to be restarted until a "crm resource
start dummy-clone" command is issued.  If I have other types of
resources dependent on the clone, such as another clone or a group, they
behave as I would expect.  It seems to be only with master-slave
resources that the crm tries to start the resource inappropriately.

In our real cluster, the master-slave returns an error (OCF_ERR_GENERIC)
when it is started while its prerequisite resource is not started.  In
this case, event#5 does not happen, and the master-slave is never again
restarted, even after the prerequisite clone resource is restarted via
"crm resource start <resource-name>".

Thanks for your help,

Chris King

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111006/88fb7ed9/attachment-0002.html>