[Pacemaker] Migration form mon to pacemaker

Fri Feb 11 03:41:38 EST 2011

On 2011-02-11 09:16, Uwe Schmeling wrote:
> Hi,
> 
> I'm just migrating my recent mon/heartbeat configuration to pacemaker.
> The point of interest is the webservice behavior. Before the monitor
> checked if the service failed twice within 20 sec, switch to other node
> was initiated if this happens. Now I'm trying to configuring the same
> behavior using pacemaker. The webservice is monitored every 10 seconds
> (interval=10), failure timeout is set to 20s (expecting to ignore all
> failures within this time frame)

That is *not* what failure-timeout means. Please reread the docs.

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html

> and it should only happen if a "valid
> failure" occurs twice (migration-theshold=2). Valid-failure means: the
> service fails twice within 20s but is ignored if the service is back
> within 20s.

There is no such thing in Pacemaker as the "valid failure" you're
talking about.

 This is the configuration, which is used to implement this
> behavior:
> 
> node lbv01 \
>         attributes standby="off"
> node lbv02 \
>         attributes standby="off"
> primitive apacheIP ocf:heartbeat:IPaddr2 \
>         params ip="10.6.151.190" \
>         op monitor interval="10s" \
>         meta is-managed="true"
> primitive haproxyIP ocf:heartbeat:IPaddr2 \
>         params ip="10.6.151.191" \
>         op monitor interval="10s"
> primitive pingd ocf:pacemaker:ping \
>         params host_list="10.6.151.11" multiplier="100" \
>         op monitor interval="15s" timeout="5s"
> *primitive webservice ocf:heartbeat:webservices \
>         op monitor on-fail="ignore" interval="10s" \
>         meta failure-timeout="20s" migration-threshold="2"*
> group webservice-ips haproxyIP apacheIP webservice \
>         meta target-role="Started"
> colocation all-resources inf: webservice-ips pingd
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1297249441" \
>         cluster-delay="30"
> 
> If a webservice monitoring failure is forced, the switchover immediately
> is performed, ignoring timeout and threshold.

I already pointed out that you've got a false impression of
failure-timeout, so that's irrelevant here.

Could it be that you are not just forcing the monitoring failure, but
also keeping the service from restarting? Some "chmod -x" trick? Because
that makes your monitor fail *and* the subsequent restart, and its that
failing restart that would cause your migration.

Or else your "webservices" agent exits with $OCF_ERR_INSTALLED on your
monitor failure, which will also cause a prompt migration.

Btw, when you write your own RA, *please* don't install it into the
"heartbeat" provider directory, instead create your own directory.
Otherwise a casual observer will think you're talking about a resource
agent that lives in our upstream repo, which for your "webservices"
agent is clearly not the case.

Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110211/4c664455/attachment-0003.sig>