[ClusterLabs] Antw: Delayed first monitoring

Andrew Beekhof andrew at beekhof.net
Sun Aug 16 22:22:43 UTC 2015


> On 13 Aug 2015, at 5:01 pm, Miloš Kozák <milos.kozak at lejmr.com> wrote:
> 
> However,
> this does not make sense at all. Presumably, the pacemaker should get along with lsb scripts which comes from system repository, right?

Explicitly no.
We get along only with /LSB compliant/ init scripts.  

Not all meet this criteria.  
Debian’s init scripts were some of the biggest offenders for many many years. 

A program such as Pacemaker needs (for example) sane return codes, for start to actually complete before returning, for starting something thats already started not to be an error. 
A human can gloss over these things, Pacemaker is not quite as smart enough to know when these kinds of errors are ok.


> 
> Therefore, there is not way how to modify lsb script because changes is lsb script erase after every package update.
> 
> 
> I believe, the systematical approach is in introducing of delayed monitoring or something like this into Pacemaker. I quite wonder that nobody has come around this problem already?
> 
> 
> Milos
> 
> 
> 
> 
> 
> Dne 13.8.2015 v 08:44 Ulrich Windl napsal(a):
>> I think the start script has to be fixed to return success when httpd is
>> actually running.
>> 
>>>>> Miloš Kozák <milos.kozak at lejmr.com> schrieb am 12.08.2015 um 16:03 in
>> Nachricht
>> <55CB521A.8090304 at lejmr.com>:
>>> Hi,
>>> 
>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
>>> provide high-availability of opennebula. However, I am facing to a
>>> strange problem which raises from my lack of knowleadge..
>>> 
>>> In the log I can see that when I create a resource based on an init
>>> script, typically:
>>> 
>>> pcs resource create httpd lsb:httpd
>>> 
>>> The httpd daemon gets started, but monitor is initiated at the same time
>>> and the resource is identified as not running. This behaviour makes
>>> sense since we realize that the daemon starting takes some time. In this
>>> particular case, I get error code 2 which means that process is running,
>>> but environment is not locked. The effect of this is that httpd resource
>>> gets restarted.
>>> 
>>> My workaround is extra sleep in status function of the init script, but
>>> I dont like this solution at all! Do you have idea how to tackle this
>>> problem in a proper way? I expected an op attribut which would specify
>>> delay after service start and first monitoring, but I could not find it..
>>> 
>>> Thank you, Milos
>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list