<div dir="ltr"><div><div>Hi Andrew,<br><br></div>Here is the output of the verbose crm_failcount.<br><br>   trace: set_crm_log_level:     New log level: 8<br>   trace: cib_native_signon_raw:     Connecting cib_rw channel<br>   trace: pick_ipc_buffer:     Using max message size of 524288<br>   debug: qb_rb_open_2:     shm size:524301; real_size:528384; rb->word_size:132096<br>   debug: qb_rb_open_2:     shm size:524301; real_size:528384; rb->word_size:132096<br>   debug: qb_rb_open_2:     shm size:524301; real_size:528384; rb->word_size:132096<br>   trace: mainloop_add_fd:     Added connection 1 for cib_rw[0x1fd79c0].4<br>   trace: pick_ipc_buffer:     Using max message size of 51200<br>   trace: crm_ipc_send:     Sending from client: cib_rw request id: 1 bytes: 131 timeout:-1 msg...<br>   trace: crm_ipc_send:     Recieved response 1, size=140, rc=140, text: <cib_common_callback_worker cib_op="register" cib_clientid="f8cfae2d-51e6-4cd7-97f8-2d6f49bf1f17"/><br>   trace: cib_native_signon_raw:     reg-reply   <cib_common_callback_worker cib_op="register" cib_clientid="f8cfae2d-51e6-4cd7-97f8-2d6f49bf1f17"/><br>   debug: cib_native_signon_raw:     Connection to CIB successful<br>   trace: cib_create_op:     Sending call options: 00001100, 4352<br>   trace: cib_native_perform_op_delegate:     Sending cib_query message to CIB service (timeout=120s)<br>   trace: crm_ipc_send:     Sending from client: cib_rw request id: 2 bytes: 211 timeout:120000 msg...<br>   trace: internal_ipc_get_reply:     client cib_rw waiting on reply to msg id 2<br>   trace: crm_ipc_send:     Recieved response 2, size=944, rc=944, text: <cib-reply t="cib" cib_op="cib_query" cib_callid="2" cib_clientid="f8cfae2d-51e6-4cd7-97f8-2d6f49bf1f17" cib_callopt="4352" cib_rc="0"><cib_calldata><nodes><node uname="<a href="http://node2.domain.com">node2.domain.com</a>" id="o<br>   trace: cib_native_perform_op_delegate:     Reply   <cib-reply t="cib" cib_op="cib_query" cib_callid="2" cib_clientid="f8cfae2d-51e6-4cd7-97f8-2d6f49bf1f17" cib_callopt="4352" cib_rc="0"><br>   trace: cib_native_perform_op_delegate:     Reply     <cib_calldata><br>   trace: cib_native_perform_op_delegate:     Reply       <nodes><br>   trace: cib_native_perform_op_delegate:     Reply         <node uname="<a href="http://node2.domain.com">node2.domain.com</a>" id="<a href="http://node2.domain.com">node2.domain.com</a>"><br>   trace: cib_native_perform_op_delegate:     Reply           <instance_attributes id="<a href="http://nodes-node2.domain.com">nodes-node2.domain.com</a>"><br>   trace: cib_native_perform_op_delegate:     Reply             <nvpair id="nodes-node2.domain.com-postgres_msg-data-status" name="postgres_msg-data-status" value="STREAMING|SYNC"/><br>   trace: cib_native_perform_op_delegate:     Reply             <nvpair id="nodes-node2.domain.com-standby" name="standby" value="off"/><br>   trace: cib_native_perform_op_delegate:     Reply           </instance_attributes><br>   trace: cib_native_perform_op_delegate:     Reply         </node><br>   trace: cib_native_perform_op_delegate:     Reply         <node uname="<a href="http://node1.domain.com">node1.domain.com</a>" id="<a href="http://node1.domain.com">node1.domain.com</a>"><br>   trace: cib_native_perform_op_delegate:     Reply           <instance_attributes id="<a href="http://nodes-node1.domain.com">nodes-node1.domain.com</a>"><br>   trace: cib_native_perform_op_delegate:     Reply             <nvpair id="nodes-node1.domain.com-postgres_msg-data-status" name="postgres_msg-data-status" value="LATEST"/><br>   trace: cib_native_perform_op_delegate:     Reply             <nvpair id="nodes-node1.domain.com-standby" name="standby" value="off"/><br>   trace: cib_native_perform_op_delegate:     Reply           </instance_attributes><br>   trace: cib_native_perform_op_delegate:     Reply         </node><br>   trace: cib_native_perform_op_delegate:     Reply       </nodes><br>   trace: cib_native_perform_op_delegate:     Reply     </cib_calldata><br>   trace: cib_native_perform_op_delegate:     Reply   </cib-reply><br>   trace: cib_native_perform_op_delegate:     Syncronous reply 2 received<br>   debug: get_cluster_node_uuid:     Result section   <nodes><br>   debug: get_cluster_node_uuid:     Result section     <node uname="<a href="http://node2.domain.com">node2.domain.com</a>" id="<a href="http://node2.domain.com">node2.domain.com</a>"><br>   debug: get_cluster_node_uuid:     Result section       <instance_attributes id="<a href="http://nodes-node2.domain.com">nodes-node2.domain.com</a>"><br>   debug: get_cluster_node_uuid:     Result section         <nvpair id="nodes-node2.domain.com-postgres_msg-data-status" name="postgres_msg-data-status" value="STREAMING|SYNC"/><br>   debug: get_cluster_node_uuid:     Result section         <nvpair id="nodes-node2.domain.com-standby" name="standby" value="off"/><br>   debug: get_cluster_node_uuid:     Result section       </instance_attributes><br>   debug: get_cluster_node_uuid:     Result section     </node><br>   debug: get_cluster_node_uuid:     Result section     <node uname="<a href="http://node1.domain.com">node1.domain.com</a>" id="<a href="http://node1.domain.com">node1.domain.com</a>"><br>   debug: get_cluster_node_uuid:     Result section       <instance_attributes id="<a href="http://nodes-node1.domain.com">nodes-node1.domain.com</a>"><br>   debug: get_cluster_node_uuid:     Result section         <nvpair id="nodes-node1.domain.com-postgres_msg-data-status" name="postgres_msg-data-status" value="LATEST"/><br>   debug: get_cluster_node_uuid:     Result section         <nvpair id="nodes-node1.domain.com-standby" name="standby" value="off"/><br>   debug: get_cluster_node_uuid:     Result section       </instance_attributes><br>   debug: get_cluster_node_uuid:     Result section     </node><br>   debug: get_cluster_node_uuid:     Result section   </nodes><br>    info: query_node_uuid:     Mapped <a href="http://node1.domain.com">node1.domain.com</a> to <a href="http://node1.domain.com">node1.domain.com</a><br>   trace: pick_ipc_buffer:     Using max message size of 51200<br>    info: attrd_update_delegate:     Connecting to cluster... 5 retries remaining<br>   debug: qb_rb_open_2:     shm size:51213; real_size:53248; rb->word_size:13312<br>   debug: qb_rb_open_2:     shm size:51213; real_size:53248; rb->word_size:13312<br>   debug: qb_rb_open_2:     shm size:51213; real_size:53248; rb->word_size:13312<br>   trace: crm_ipc_send:     Sending from client: attrd request id: 3 bytes: 168 timeout:5000 msg...<br>   trace: internal_ipc_get_reply:     client attrd waiting on reply to msg id 3<br>   trace: crm_ipc_send:     Recieved response 3, size=88, rc=88, text: <ack function="attrd_ipc_dispatch" line="129"/><br>   debug: attrd_update_delegate:     Sent update: (null)=(null) for <a href="http://node1.domain.com">node1.domain.com</a><br>    info: main:     Update (null)=<none> sent via attrd<br>   debug: cib_native_signoff:     Signing out of the CIB Service<br>   trace: mainloop_del_fd:     Removing client cib_rw[0x1fd79c0]<br>   trace: mainloop_gio_destroy:     Destroying client cib_rw[0x1fd79c0]<br>   trace: crm_ipc_close:     Disconnecting cib_rw IPC connection 0x1fdb020 (0x1fdb1a0.(nil))<br>   debug: qb_ipcc_disconnect:     qb_ipcc_disconnect()<br>   trace: qb_rb_close:     ENTERING qb_rb_close()<br>   debug: qb_rb_close:     Closing ringbuffer: /dev/shm/qb-cib_rw-request-8347-9344-14-header<br>   trace: qb_rb_close:     ENTERING qb_rb_close()<br>   debug: qb_rb_close:     Closing ringbuffer: /dev/shm/qb-cib_rw-response-8347-9344-14-header<br>   trace: qb_rb_close:     ENTERING qb_rb_close()<br>   debug: qb_rb_close:     Closing ringbuffer: /dev/shm/qb-cib_rw-event-8347-9344-14-header<br>   trace: cib_native_destroy:     destroying 0x1fd7910<br>   trace: crm_ipc_destroy:     Destroying IPC connection to cib_rw: 0x1fdb020<br>   trace: mainloop_gio_destroy:     Destroyed client cib_rw[0x1fd79c0]<br>   trace: crm_exit:     cleaning up libxml<br>    info: crm_xml_cleanup:     Cleaning up memory from libxml2<br>   trace: crm_exit:     exit 0<br><br></div>I hope it helps.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-05-20 6:34 GMT+02:00 Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

> On 4 May 2015, at 6:43 pm, Alexandre <<a href="mailto:alxgomz@gmail.com">alxgomz@gmail.com</a>> wrote:<br>

><br>

> Hi,<br>

><br>

> I have a pacemaker / corosync / cman cluster running on redhat 6.6.<br>

> Although cluster is working as expected, I have some trace of old failures (several monthes ago) I can't gert rid of.<br>

> Basically I have set cluster-recheck-interval="300" and failure-timeout="600" (in rsc_defaults) as shown bellow:<br>

><br>

> property $id="cib-bootstrap-options" \<br>

>     dc-version="1.1.10-14.el6-368c726" \<br>

>     cluster-infrastructure="cman" \<br>

>     expected-quorum-votes="2" \<br>

>     no-quorum-policy="ignore" \<br>

>     stonith-enabled="false" \<br>

>     last-lrm-refresh="1429702408" \<br>

>     maintenance-mode="false" \<br>

>     cluster-recheck-interval="300"<br>

> rsc_defaults $id="rsc-options" \<br>

>     failure-timeout="600"<br>

><br>

> So I would expect old failure to be purged from the cib long ago, but actually I have the following when issuing crm_mon -frA1.<br>

<br>

</span>I think automatic deletion didnt arrive until later.<br>

<span class=""><br>

><br>

> Migration summary:<br>

> * Node host1:<br>

>    etc_ml_drbd: migration-threshold=1000000 fail-count=244 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    spool_postfix_drbd_msg: migration-threshold=1000000 fail-count=244 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    lib_ml_drbd: migration-threshold=1000000 fail-count=244 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    lib_imap_drbd: migration-threshold=1000000 fail-count=244 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    spool_imap_drbd: migration-threshold=1000000 fail-count=11654 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    spool_ml_drbd: migration-threshold=1000000 fail-count=244 last-failure='Sat Feb 14 17:04:05 2015'<br>

>    documents_drbd: migration-threshold=1000000 fail-count=248 last-failure='Sat Feb 14 17:58:55 2015'<br>

> * Node host2<br>

>    documents_drbd: migration-threshold=1000000 fail-count=548 last-failure='Sat Feb 14 16:26:33 2015'<br>

><br>

> I have tried to crm_failcount -D the resources also tried cleanup... but it's still there!<br>

<br>

</span>Oh?  Can you re-run with -VVVVVV and show us the result?<br>

<span class=""><br>

> How can I get reid of those record (so my monitoring tools stop complaining) .<br>

><br>

> Regards.<br>

</span>> _______________________________________________<br>

> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

><br>

> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

<br>

<br>

_______________________________________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br></div>