[ClusterLabs] Antw: Re: Antw: Delayed first monitoring
Ulrich.Windl at rz.uni-regensburg.de
Thu Aug 13 04:35:43 EDT 2015
>>> Miloš Kozák <milos.kozak at lejmr.com> schrieb am 13.08.2015 um 09:01 in
<55CC40B1.7090904 at lejmr.com>:
> this does not make sense at all. Presumably, the pacemaker should get
> along with lsb scripts which comes from system repository, right?
I don't think pacemaker is a technology to handle broken start scripts.
Beining able to use LSB scripts probably was there to manage something without
having an OCF RA for it, but we generally don't use LSB scripts for HA here.
> Therefore, there is not way how to modify lsb script because changes is
> lsb script erase after every package update.
That's double nonsense: Contact your support to get the scripts fixed. Despite
of that there is absolutely no reason not to copy the script under a different
name and fix that.
> I believe, the systematical approach is in introducing of delayed
> monitoring or something like this into Pacemaker. I quite wonder that
> nobody has come around this problem already?
Maybe because there's an apache RA?
# man -k apache | grep -i OCF
ocf_heartbeat_apache (7) - Manages an Apache Web server instance
> Dne 13.8.2015 v 08:44 Ulrich Windl napsal(a):
>> I think the start script has to be fixed to return success when httpd is
>> actually running.
>>>>> Miloš Kozák <milos.kozak at lejmr.com> schrieb am 12.08.2015 um 16:03 in
>> <55CB521A.8090304 at lejmr.com>:
>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
>>> provide high-availability of opennebula. However, I am facing to a
>>> strange problem which raises from my lack of knowleadge..
>>> In the log I can see that when I create a resource based on an init
>>> script, typically:
>>> pcs resource create httpd lsb:httpd
>>> The httpd daemon gets started, but monitor is initiated at the same time
>>> and the resource is identified as not running. This behaviour makes
>>> sense since we realize that the daemon starting takes some time. In this
>>> particular case, I get error code 2 which means that process is running,
>>> but environment is not locked. The effect of this is that httpd resource
>>> gets restarted.
>>> My workaround is extra sleep in status function of the init script, but
>>> I dont like this solution at all! Do you have idea how to tackle this
>>> problem in a proper way? I expected an op attribut which would specify
>>> delay after service start and first monitoring, but I could not find it..
>>> Thank you, Milos
>>> Users mailing list: Users at clusterlabs.org
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> Users mailing list: Users at clusterlabs.org
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users