[Pacemaker] Possible bug with mandatory ordering involving stateful (i.e. master-slave) resources

Tue Oct 11 21:02:42 EDT 2011

On Fri, Oct 7, 2011 at 2:05 AM, King, Christopher
<CKing at broadviewnet.com> wrote:
> Possible bug with mandatory ordering involving stateful (i.e. master-slave)
> resources
>
>
>
> I have a 2-node cluster (we are running the SLES 11 HA extension, so the
> pacemaker version is 1.1.2) in which a master-slave resource is dependent on
> a clone resource via a mandatory ordering constraint.  From “crm configure
> show”:
>
>
>
> primitive dummy ocf:heartbeat:Dummy \
>
>         op monitor interval="15s" \
>
>         op start interval="0" timeout="40s" \
>
>         op stop interval="0" timeout="60s"
>
>
>
> primitive statefuldummy ocf:heartbeat:Stateful \
>
>         op start timeout="1800s" \
>
>         op timeout="45s" \
>
>         op monitor interval="10s" timeout="60s" \
>
>         op promote timeout="45s" \
>
>         op demote timeout="30s"
>
>
>
> ms dummy-ms statefuldummy \
>
>         meta target-role="Started" master-max="1" master-node-max="1"
> clone-max="2" clone-node-max="1" notify="false" ordered="false"
> globally-unique="false" is-managed="true"
>
>
>
> clone dummy-clone dummy \
>
>         meta target-role="Started"
>
>
>
> order dummy-order inf: dummy-clone dummy-ms
>
> (I reproduced the problem we are experiencing with dummy resources to try
> and eliminate the RAs for our real resources as the source of the issue.)
>
>
>
> The order of events is as follows:
>
> 1)     Force a shutdown of the dummy-clone via “crm resource stop
> dummy-clone”
>
> 2)     Logs show that the crm stops both the master and slave statefuldummy
> resources of the dummy-ms.  Good.
>
> 3)     Logs show that the crm stops the dummy-clone resources.  Good.
>
> 4)     Logs immediately show that the crm starts the master and slave
> statefuldummy resources of the dummy-ms.  Bad.
>
> 5)     Logs show the crm stopping the statefuldumy resources again.  Good?
>
>
>
> Has anyone seen something similar?  My understanding of the ordering
> constraints tells me that event #4 is erroneous behaviour.

Correct.  Since you're a SLES customer, I'd advise you to contact SUSE
directly - they should be able to give it the proper attention and
escalate upstream if its not already fixed.

> I would not
> expect the statefuldummy resources to be restarted until a “crm resource
> start dummy-clone” command is issued.  If I have other types of resources
> dependent on the clone, such as another clone or a group, they behave as I
> would expect.  It seems to be only with master-slave resources that the crm
> tries to start the resource inappropriately.
>
>
>
> In our real cluster, the master-slave returns an error (OCF_ERR_GENERIC)
> when it is started while its prerequisite resource is not started.  In this
> case, event#5 does not happen, and the master-slave is never again
> restarted, even after the prerequisite clone resource is restarted via “crm
> resource start <resource-name>”.
>
>
>
> Thanks for your help,
>
> Chris King
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>