[Pacemaker] Dangling last-failure transient attribute

Vladislav Bogdanov bubble at hoster-ok.com
Tue Nov 20 02:30:00 EST 2012


Looking at pengine inputs (06229e9) I noticed that there are transient
last-failure-<rsc_id> attributes for resources last failed a long ago
(more that 60000 seconds).

Example is:
  <node_state id="1107559690" uname="vd01-c" in_ccm="true" crmd="online"
join="member" expected="member" crm-debug-origin="do_state_transition">
    <transient_attributes id="1107559690">
      <instance_attributes id="status-1107559690">
        <nvpair id="status-1107559690-probe_complete"
name="probe_complete" value="true"/>
name="last-failure-bubble-test01.vds-ok.com-vm" value="1353335921"/>

date +%s shows 1353396142, so 1353396142-1353335921=60221

According to code ( update_failcount() and handle_failcount_op() ) that
attribute is always added/removed in a pair with fail-count-<rsc_id>,
but later is not in a list for any node.

Is that bug or feature? Or I just miss something?


