[Pacemaker] removed resources still generating log entries

Thu Dec 22 06:43:51 EST 2011

Hi

We have built a cluster on top of the SLES 11 SP1 stack, which manages various Xen VMs.

In the development phase we used some test VM resources, which have since been removed from the resource list. However I see some remnants of these old resources in the log files, and would like to xclean this up.

e.g. I see

Dec 22 12:27:18 node2 pengine: [6262]: info: get_failcount: hvm1 has failed 1 times on node2
Dec 22 12:27:18 node2 pengine: [6262]: notice: common_apply_stickiness: hvm1 can fail 999999 more times on node2 before being forced off
Dec 22 12:27:18 node2 attrd: [6261]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-hvm1 (1)
Dec 22 12:27:18 node2 attrd: [6261]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-hvm1 (1322579680)

hvm1 was a VM in that test phase.

If I do a dump of the CIB, I find this section

  <status>
    <node_state uname="node2" ha="active" in_ccm="true" crmd="online" join="member" expected="member" shutdown="0" id="node2" crm-debug-origin="do_state_transition">
      <lrm id="node2">
        <lrm_resources>
...
          <lrm_resource id="hvm1" type="Xen" class="ocf" provider="heartbeat">
            <lrm_rsc_op id="hvm1_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.2" transition-key="20:11:7:1fd9e9b1-610e-4768-abd5-35ea3ce45c4d" transition-magic="0:7;20:11:7:1fd9e9b1-610e-4768-abd5-35ea3ce45c4d" call-id="27" rc-code="7" op-status="0" interval="0" last-run="1322130825" last-rc-change="1322130825" exec-time="550" queue-time="0" op-digest="71594dc818f53dfe034bb5e84c6d80fb"/>
            <lrm_rsc_op id="hvm1_stop_0" operation="stop" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.2" transition-key="61:511:0:abda911e-05ed-4e11-8e25-ab03a1bfd7b7" transition-magic="0:0;61:511:0:abda911e-05ed-4e11-8e25-ab03a1bfd7b7" call-id="56" rc-code="0" op-status="0" interval="0" last-run="1322580820" last-rc-change="1322580820" exec-time="164320" queue-time="0" op-digest="71594dc818f53dfe034bb5e84c6d80fb"/>
            <lrm_rsc_op id="hvm1_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.2" transition-key="59:16:0:1fd9e9b1-610e-4768-abd5-35ea3ce45c4d" transition-magic="0:0;59:16:0:1fd9e9b1-610e-4768-abd5-35ea3ce45c4d" call-id="30" rc-code="0" op-status="0" interval="0" last-run="1322131559" last-rc-change="1322131559" exec-time="470" queue-time="0" op-digest="71594dc818f53dfe034bb5e84c6d80fb"/>
          </lrm_resource>
...

I tried

cibadmin -Q > tmp.xml
vi tmp.xml
cibadmin --replace --xml-file tmp.xml

but this does not do the job, I guess because the problematic bits are in the status section.

Any clue how to clean this up properly, preferably without any cluster downtime?

Thanks,
Kevin

version info

node2 # rpm -qa | egrep "heartbeat|pacemaker|cluster|openais"
libopenais3-1.1.2-0.5.19
pacemaker-mgmt-2.0.0-0.2.19
openais-1.1.2-0.5.19
cluster-network-kmp-xen-1.4_2.6.32.12_0.6-2.1.73
libpacemaker3-1.1.2-0.2.1
drbd-heartbeat-8.3.7-0.4.15
cluster-glue-1.0.5-0.5.1
drbd-pacemaker-8.3.7-0.4.15
cluster-network-kmp-default-1.4_2.6.32.12_0.6-2.1.73
pacemaker-1.1.2-0.2.1
yast2-cluster-2.15.0-8.6.19
pacemaker-mgmt-client-2.0.0-0.2.19
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111222/d877bbc9/attachment-0002.html>