[Pacemaker] failure of a monitor operation would be disappearedfrom crm_mon

Junko IKEDA ikedaj at intellilink.co.jp
Mon Sep 8 21:34:35 EDT 2008


> because it is no longer in the current start/stop series for the
> resource.
> 
> >
> > It might be an expected behavior for now,
> 
> it is.
> 
> >
> > it would be convenient if crm_mon can keep showing some past failures.
> 
> it cant display them forever.  they are not (and should not) be kept
> in the CIB forever as it would cause the CIB size to explode.

It's true that CIB will get larger if it keeps them...
I want to ask one more about this case;

(1) resource starts on DC

============
Last updated: Tue Sep  9 09:51:51 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============

Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online

Full list of resources:

dummy   (ocf::heartbeat:Dummy): Started node-b

Operations:
* Node node-b:
   dummy:
    + start: rc=0 (ok)
    + monitor: interval=10000ms rc=0 (ok)
* Node node-a:



(2) resource do a failover from DC to non-DC

============
Last updated: Tue Sep  9 09:52:17 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============

Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online

Full list of resources:

dummy   (ocf::heartbeat:Dummy): Started node-a

Operations:
* Node node-b:
   dummy:  fail-count=1
    + start: rc=0 (ok)
    + monitor: interval=10000ms rc=7 (not running)
    + stop: rc=0 (ok)
* Node node-a:
   dummy:
    + start: rc=0 (ok)
    + monitor: interval=10000ms rc=0 (ok)

Failed actions:
    dummy_monitor_10000 (node=node-b, call=4, rc=7): complete



(3) stop non-DC node

============
Last updated: Tue Sep  9 09:52:45 2008
Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
2 Nodes configured.
1 Resources configured.
============

Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): OFFLINE

Full list of resources:

dummy   (ocf::heartbeat:Dummy): Stopped

Operations:
* Node node-b:
   dummy:  fail-count=1
    + start: rc=0 (ok)
    + monitor: interval=10000ms rc=7 (not running)
    + stop: rc=0 (ok)
* Node node-a:
   dummy:
    + start: rc=0 (ok)
    + monitor: interval=10000ms rc=0 (ok)
    + stop: rc=0 (ok)

Failed actions:
    dummy_monitor_10000 (node=node-b, call=4, rc=7): complete



It seems that DC can keep its failure history.
Does it mean dummy_monitor_10000 (call=4) is in the current start/stop
series?


Thanks,
Junko





More information about the Pacemaker mailing list