[ClusterLabs] Delayed first monitoring

Miloš Kozák milos.kozak at lejmr.com
Wed Aug 12 12:19:02 EDT 2015


I would rather agree with you. However, I dont have logs at hand to 
prove it... but that is what I saw in logs thus I formulated my question 
as I did :D

Dne 12.8.2015 v 18:16 emmanuel segura napsal(a):
> Sorry, but from my point of view, the agent first check if the
> resource is running, for example you can check that from
> /usr/lib/ocf/resource.d/heartbeat/Filesystem
>
> The logic is
>
> Filesystem::start(parameter as parameter for the
> agent)->Filesystem_start(function called from start in the case which
> evaluate the parameters) -> Filesystem_status(function called for the
> previous one), If the fs is already mounted return success.
>
> so you need to check if the resource is already started.
>
> 2015-08-12 16:14 GMT+02:00 Nekrasov, Alexander <alexander.nekrasov at emc.com>:
>> 1. Pacemaker will/may call a monitor before starting a resource, in which case it expects a NOT_RUNNING response. It's just checking assumptions at that point.
>>
>> 2. A resource::start must only return when resource::monitor is successful. Basically the logic of a start() must follow this:
>>
>> start() {
>>    start_daemon()
>>    while ! monitor() ; do
>>        sleep some
>>    done
>>    return $OCF_SUCCESS
>> }
>>
>>> -----Original Message-----
>>> From: Miloš Kozák [mailto:milos.kozak at lejmr.com]
>>> Sent: Wednesday, August 12, 2015 10:03 AM
>>> To: users at clusterlabs.org
>>> Subject: [ClusterLabs] Delayed first monitoring
>>>
>>> Hi,
>>>
>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
>>> provide high-availability of opennebula. However, I am facing to a
>>> strange problem which raises from my lack of knowleadge..
>>>
>>> In the log I can see that when I create a resource based on an init
>>> script, typically:
>>>
>>> pcs resource create httpd lsb:httpd
>>>
>>> The httpd daemon gets started, but monitor is initiated at the same time
>>> and the resource is identified as not running. This behaviour makes
>>> sense since we realize that the daemon starting takes some time. In this
>>> particular case, I get error code 2 which means that process is running,
>>> but environment is not locked. The effect of this is that httpd resource
>>> gets restarted.
>>>
>>> My workaround is extra sleep in status function of the init script, but
>>> I dont like this solution at all! Do you have idea how to tackle this
>>> problem in a proper way? I expected an op attribut which would specify
>>> delay after service start and first monitoring, but I could not find
>>> it..
>>>
>>> Thank you, Milos
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>





More information about the Users mailing list