[ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

Sun Apr 24 17:20:02 EDT 2016

On 04/22/2016 01:13 PM, Dimitri Maziuk wrote:
> On 04/22/2016 12:58 PM, Ken Gaillot wrote:
> 
>>> Consider that monitoring - at least as part of the action -
>>> should check if what your service is actually providing is
>>> working according to some functional and nonfunctional
>>> constraints as to simulate the experience of the consumer of
>>> your services.
> 
> Goedel and Turing say the only one who can answer that is the
> actual consumer. So a simple check for what you *can* check would
> be very nice indeed.
> 
>> Also, you can provide multiple levels of monitoring:
>> 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multiple_monitor_operations
>>
>>
>> 
>> For example, you could provide a very simple check that just makes sure
>> MySQL is responding on its port, and run that frequently with a
>> low timeout. And your existing thorough monitor could be run less
>> frequently with a high timeout.
> 
> Looking at this, it seems you have to actually rewrite the RA to
> switch on $OCF_CHECK_LEVEL -- unless the stock RA already provides
> the "simple check" you need, is that correct?
> 
> E.g. this page:
> http://linux-ha.org/doc/man-pages/re-ra-apache.html suggests that
> apache RA does not and all you can do in practice is run the same
> curl http:/localhost/server-status check with different 
> frequencies. Would that be what we actually have ATM?

Correct, you would need to customize the RA. Given how long you said a
check can take, I assumed you already had a custom check that did
something more detailed than the stock mysql RA.