[ClusterLabs] Cluster monitoring

Wed Oct 21 10:56:11 EDT 2015

On 10/21/2015 08:24 AM, Michael Schwartzkopff wrote:
> Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey:
>> Hi folks
>> 
>> I had a question on monitoring of cluster events. Based on the 
>> documentation it seems that cluster monitor is the only method
>> of monitoring the cluster events. Also since it seems to poll
>> based on the interval configured it might miss some events. Is
>> that the case ?
> 
> No. the cluser is event-based. So it won't miss any event. If you
> use the cluster's tools, they see hte events. If you monitor the
> events you won't miss any either.

FYI, Pacemaker 1.1.14 will have built-in handling of notification
scripts, without needing a ClusterMon resource. These will be
event-driven. Andrew Beekhof did a recent blog post about it:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Pacemaker's monitors are polling, at the interval specified when
configuring the monitor operation. Pacemaker relies on the resource
agent to return status for monitors, so technically it's up to the
resource agent whether it can "miss" brief outages that occur between
polls. All the ones I've looked at would miss them, but generally
that's considered acceptable if the service is once again fully
working when the monitor runs (because it implies it recovered itself).

Some people use an external monitoring system (nagios, icinga, zabbix,
etc.) in addition to Pacemaker's monitors. They can complement each
other, as the external system can check system parameters outside
Pacemaker's view and can alert administrators for some early warning
signs before a resource gets to the point of needing recovery. Of
course such monitoring systems are also polling at configured intervals.