[Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

Mon Sep 27 05:26:17 UTC 2010

Hi,

When I investigated another problem, I discovered this phenomenon.
If attrd causes process trouble and does not restart, the problem does not occur.

Step1) After start, it causes a monitor error in UmIPaddr twice.

Online: [ srv01 srv02 ]

 Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

Migration summary:
* Node srv02: 
* Node srv01: 
   UmIPaddr: migration-threshold=10 fail-count=2

Step2) Kill Attrd and Attrd reboots.

Online: [ srv01 srv02 ]

 Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

Migration summary:
* Node srv02: 
* Node srv01: 
   UmIPaddr: migration-threshold=10 fail-count=2

Step3) It causes a monitor error in UmIPaddr.

Online: [ srv01 srv02 ]

 Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

Migration summary:
* Node srv02: 
* Node srv01: 
   UmIPaddr: migration-threshold=10 fail-count=1 -----> Fail-count return to the first.

The problem is so that attrd disappears fail-count by reboot.(Hash-tables is Lost.)
It is a problem very much that the trouble number of times is initialized.

I think that there is the following method. 

method 1)Attrd maintain fail-count as a file in "/var/run" directories and refer.

method 2)When attrd started, Attrd communicates with cib and receives fail-count.

Is there a better method?

Please think about the solution of this problem.

Best Regards,
Hideo Yamauchi.