[ClusterLabs] Delayed first monitoring

Sun Aug 16 19:48:05 EDT 2015

> On 13 Aug 2015, at 2:20 am, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
> On 08/12/2015 10:45 AM, Miloš Kozák wrote:
>> Thank you for your answer, but.
>> 
>> 1) This sounds ok, but in other words it means the first delayed check
>> is not possible to be done.
>> 
>> 2) Start of init script? I follow lsb scripts from distribution, so
>> there is not way to change them (I can change them, but with packages
>> upgade they will go void). The is quite typical approach, how can I do
>> HA for atlassian for example? Jira loads 5minutes..
> 
> I think your situation involves multiple issues which are worth
> separating for clarity:
> 
> 1. As Alexander mentioned, Pacemaker will do a monitor BEFORE trying to
> start a service, to make sure it's not already running. So these don't
> need any delay and are expected to "fail".
> 
> 2. Resource agents MUST NOT return success for "start" until the service
> is fully up and running, so the next monitor should succeed, again
> without needing any delay. If that's not the case, it's a bug in the agent.

Consider the ordering constraint “start A then B”.

Regardless of whether you delay A’s monitor operation, B is going to expect A is up when “start A” completes.
So it should only indicate completion once its actually usable.

> 
> 3. It's generally better to use OCF resource agents whenever available,
> as they have better integration with pacemaker than lsb/systemd/upstart.
> In this case, take a look at ocf:heartbeat:apache.
> 
> 4. You can configure the timeout used with each action (stop, start,
> monitor, restart) on a given resource. The default is 20 seconds. For
> example, if a "start" action is expected to take 5 minutes, you would
> define a start operation on the resource with timeout=300s. How you do
> that depends on your management tool (pcs, crmsh, or cibadmin).
> 
> Bottom line, you should never need a delay on the monitor, instead set
> appropriate timeouts for each action, and make sure that the agent does
> not return from "start" until the service is fully up.
> 
>> Dne 12.8.2015 v 16:14 Nekrasov, Alexander napsal(a):
>>> 1. Pacemaker will/may call a monitor before starting a resource, in
>>> which case it expects a NOT_RUNNING response. It's just checking
>>> assumptions at that point.
>>> 
>>> 2. A resource::start must only return when resource::monitor is
>>> successful. Basically the logic of a start() must follow this:
>>> 
>>> start() {
>>>   start_daemon()
>>>   while ! monitor() ; do
>>>       sleep some
>>>   done
>>>   return $OCF_SUCCESS
>>> }
>>> 
>>>> -----Original Message-----
>>>> From: Miloš Kozák [mailto:milos.kozak at lejmr.com]
>>>> Sent: Wednesday, August 12, 2015 10:03 AM
>>>> To: users at clusterlabs.org
>>>> Subject: [ClusterLabs] Delayed first monitoring
>>>> 
>>>> Hi,
>>>> 
>>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
>>>> provide high-availability of opennebula. However, I am facing to a
>>>> strange problem which raises from my lack of knowleadge..
>>>> 
>>>> In the log I can see that when I create a resource based on an init
>>>> script, typically:
>>>> 
>>>> pcs resource create httpd lsb:httpd
>>>> 
>>>> The httpd daemon gets started, but monitor is initiated at the same time
>>>> and the resource is identified as not running. This behaviour makes
>>>> sense since we realize that the daemon starting takes some time. In this
>>>> particular case, I get error code 2 which means that process is running,
>>>> but environment is not locked. The effect of this is that httpd resource
>>>> gets restarted.
>>>> 
>>>> My workaround is extra sleep in status function of the init script, but
>>>> I dont like this solution at all! Do you have idea how to tackle this
>>>> problem in a proper way? I expected an op attribut which would specify
>>>> delay after service start and first monitoring, but I could not find
>>>> it..
>>>> 
>>>> Thank you, Milos
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org