[ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

Dimitri Maziuk dmaziuk at bmrb.wisc.edu
Fri Apr 22 18:13:22 UTC 2016


On 04/22/2016 12:58 PM, Ken Gaillot wrote:

>> Consider that monitoring - at least as part of the action - should check
>> if what your service is actually providing
>> is working according to some functional and nonfunctional constraints as
>> to simulate the experience of the
>> consumer of your services.

Goedel and Turing say the only one who can answer that is the actual
consumer. So a simple check for what you *can* check would be very nice
indeed.

> Also, you can provide multiple levels of monitoring:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multiple_monitor_operations
> 
> For example, you could provide a very simple check that just makes sure
> MySQL is responding on its port, and run that frequently with a low
> timeout. And your existing thorough monitor could be run less frequently
> with a high timeout.

Looking at this, it seems you have to actually rewrite the RA to switch
on $OCF_CHECK_LEVEL -- unless the stock RA already provides the "simple
check" you need, is that correct?

E.g. this page: http://linux-ha.org/doc/man-pages/re-ra-apache.html
suggests that apache RA does not and all you can do in practice is run
the same curl http:/localhost/server-status check with different
frequencies. Would that be what we actually have ATM?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160422/4b3e4d6e/attachment-0002.sig>


More information about the Users mailing list