[Pacemaker] Expired fail-count doesn't get cleaned up.
    David Coulson 
    david at davidcoulson.net
       
    Tue Jul 31 09:36:39 UTC 2012
    
    
  
I'm running RHEL6 with the tech preview of pacemaker it ships with. I've 
a number of resources which have a failure-timeout="60", which most of 
the time does what it is supposed to.
Last night a resource failed, which was part of a clone - While the 
resource recovered, the fail-count log never got cleaned up. Around 
every second the DC logged the pengine message below. I manually did a 
resource cleanup, and it seems happy now. Is there something I should be 
looking for in the logs to indicate that it 'missed' expiring this?
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
Migration summary:
* Node dresproddns01:
    re-openfire-lsb:0: migration-threshold=1000000 fail-count=1 
last-failure='Mon Jul 30 21:57:53 2012'
* Node dresproddns02:
Jul 31 05:32:34 dresproddns02 pengine: [2860]: notice: get_failcount: 
Failcount for cl-openfire on dresproddns01 has expired (limit was 60s)
    
    
More information about the Pacemaker
mailing list