[ClusterLabs] Wait until resource is really ready before moving clusterip

Joakim Hansson joakim.hansson87 at gmail.com
Thu Jan 14 12:46:44 UTC 2016


>
> >> Hi,
> >>
> >> There is the ocf:heartbeat:Delay resource agent, which on one hand is
> >> documented as a test resource, but on the other hand should do what you
> >> need:
> >>
> >> primitive solr ...
> >> primitive two-minute-delay ocf:heartbeat:Delay \
> >>   params startdelay=120 meta target-role=Started \
> >> op start timeout=180
> >> group solr-then-wait solr two-minute-delay
> >>
> >> Now the group acts basically like the solr resource, except for the
> >> two-minute delay after starting solr before the group itself is
> >> considered started.
> >>
> >> Cheers,
> >> Kristoffer
> >>
> >>>
> >>> / Jocke
> >
> >Another way would be to customize the tomcat resource agent so that
> >start doesn't return success until it's fully ready to accept requests
> >(which would probably be specific to whatever app you're running via
> >tomcat). Of course you'd need a long start timeout.
>
> Thanks for the tips guys!
I'm using the systemd RA of tomcat (I know it's not recommended) and can't
seem to figure out  how to go about postponing the success return.
Maybe I'll try the OCF one later.

When adding the Delay RA it starts throwing a bunch of errors and the
cluster starts fencing the nodes one by one.

The error's I get with "pcs status":

Failed Actions:
* Delay_monitor_0 on node03 'unknown error' (1): call=51, status=Timed Out,
exit
reason='none',
    last-rc-change='Thu Jan 14 13:30:14 2016', queued=0ms, exec=30002ms
* Delay_monitor_0 on node01 'unknown error' (1): call=53, status=Timed Out,
exit
reason='none',
    last-rc-change='Thu Jan 14 13:30:14 2016', queued=0ms, exec=30002ms
* Delay_monitor_0 on node02 'unknown error' (1): call=51, status=Timed Out,
exit
reason='none',
    last-rc-change='Thu Jan 14 13:30:14 2016', queued=0ms, exec=30006ms

and in the /var/log/pacemaker.log:

https://github.com/apepojken/pacemaker-errors/blob/master/ocf:heartbeat:Delay

I added the Delay RA with:

pcs resource create Delay ocf:heartbeat:Delay \
startdelay="120" meta target-role=Started \
op start timeout="180"

and my config looks like this:

https://github.com/apepojken/pacemaker/blob/master/Config

Am I missing something obvious here?

Thanks again for all the help so far!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160114/16b06ff2/attachment.htm>


More information about the Users mailing list