[Pacemaker] monitor operation stopped running

Tue Dec 14 10:16:22 UTC 2010

Hi

I have noticed this happening a few times on various of my clusters.
The monitor operation for some resources stops running, and thus
resource failures are not detected.  If I edit the cib, and change
something regarding the resource (generally I change the monitor
interval), the resource starts monitoring again, detects the failure and
restarts correctly

I am using pacemaker 1.0.9 live, and 1.0.10 in test.

This has happened with both clone and non-clone resources.

I have attached a log which shows the behaviour.  I have a resource
(megaswitch) running cloned over 6 nodes.

Until 06:48:22, the monitor is running correctly (the app logs the
"Deleting context for MONTEST-" line when the monitor is run)
After that, the monitor is not run again on this node

I have the logs for the other nodes, if they are needed to try and debug
this.

-- 
Chris Picton

Executive Manager - Systems
ECN Telecommunications (Pty) Ltd
t:   010 590 0031 m: 079 721 8521
f:   087 941 0813
e:  chris at ecntelecoms.com

"Lowering the cost of doing business"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101214/4c04438e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature-logo.gif
Type: image/gif
Size: 4948 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101214/4c04438e/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log.txt.gz
Type: application/x-gzip
Size: 3851 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101214/4c04438e/attachment-0001.bin>