[Pacemaker] Possible bug with mandatory ordering involving stateful (i.e. master-slave) resources
Andrew Beekhof
andrew at beekhof.net
Tue Oct 11 21:02:42 EDT 2011
On Fri, Oct 7, 2011 at 2:05 AM, King, Christopher
<CKing at broadviewnet.com> wrote:
> Possible bug with mandatory ordering involving stateful (i.e. master-slave)
> resources
>
>
>
> I have a 2-node cluster (we are running the SLES 11 HA extension, so the
> pacemaker version is 1.1.2) in which a master-slave resource is dependent on
> a clone resource via a mandatory ordering constraint. From “crm configure
> show”:
>
>
>
> primitive dummy ocf:heartbeat:Dummy \
>
> op monitor interval="15s" \
>
> op start interval="0" timeout="40s" \
>
> op stop interval="0" timeout="60s"
>
>
>
> primitive statefuldummy ocf:heartbeat:Stateful \
>
> op start timeout="1800s" \
>
> op timeout="45s" \
>
> op monitor interval="10s" timeout="60s" \
>
> op promote timeout="45s" \
>
> op demote timeout="30s"
>
>
>
> ms dummy-ms statefuldummy \
>
> meta target-role="Started" master-max="1" master-node-max="1"
> clone-max="2" clone-node-max="1" notify="false" ordered="false"
> globally-unique="false" is-managed="true"
>
>
>
> clone dummy-clone dummy \
>
> meta target-role="Started"
>
>
>
> order dummy-order inf: dummy-clone dummy-ms
>
> (I reproduced the problem we are experiencing with dummy resources to try
> and eliminate the RAs for our real resources as the source of the issue.)
>
>
>
> The order of events is as follows:
>
> 1) Force a shutdown of the dummy-clone via “crm resource stop
> dummy-clone”
>
> 2) Logs show that the crm stops both the master and slave statefuldummy
> resources of the dummy-ms. Good.
>
> 3) Logs show that the crm stops the dummy-clone resources. Good.
>
> 4) Logs immediately show that the crm starts the master and slave
> statefuldummy resources of the dummy-ms. Bad.
>
> 5) Logs show the crm stopping the statefuldumy resources again. Good?
>
>
>
> Has anyone seen something similar? My understanding of the ordering
> constraints tells me that event #4 is erroneous behaviour.
Correct. Since you're a SLES customer, I'd advise you to contact SUSE
directly - they should be able to give it the proper attention and
escalate upstream if its not already fixed.
> I would not
> expect the statefuldummy resources to be restarted until a “crm resource
> start dummy-clone” command is issued. If I have other types of resources
> dependent on the clone, such as another clone or a group, they behave as I
> would expect. It seems to be only with master-slave resources that the crm
> tries to start the resource inappropriately.
>
>
>
> In our real cluster, the master-slave returns an error (OCF_ERR_GENERIC)
> when it is started while its prerequisite resource is not started. In this
> case, event#5 does not happen, and the master-slave is never again
> restarted, even after the prerequisite clone resource is restarted via “crm
> resource start <resource-name>”.
>
>
>
> Thanks for your help,
>
> Chris King
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
More information about the Pacemaker
mailing list