[ClusterLabs] stonithd/fenced filling up logs

Tue Sep 27 17:03:47 UTC 2016

I have two two-node clusters set up using corosync/pacemaker on CentOS 6.8. One cluster is simply sharing an IP, while the other one has numerous services and IP's set up between the two machines in the cluster. Both appear to be working fine. However, I was poking around today, and I noticed that on the single IP cluster, corosync, stonithd, and fenced were using "significant" amounts of processing power - 25% for corosync on the current primary node, with fenced and stonithd often showing 1-2% (not horrible, but more than any other process). In looking at my logs, I see that they are dumping messages like the following to the messages log every second or two:

Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:  warning: get_xpath_object: No match for //@st_delegate in /st-reply
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:   notice: remote_op_done: Operation reboot of fai-dbs1 by fai-dbs2 for stonith_admin.cman.15835 at fai-dbs2.c5161517: No such device
Sep 27 08:51:50 fai-dbs1 crmd[4855]:   notice: tengine_stonith_notify: Peer fai-dbs1 was not terminated (reboot) by fai-dbs2 for fai-dbs2: No such device (ref=c5161517-c0cc-42e5-ac11-1d55f7749b05) by client stonith_admin.cman.15835
Sep 27 08:51:50 fai-dbs1 fence_pcmk[15393]: Requesting Pacemaker fence fai-dbs2 (reset)
Sep 27 08:51:50 fai-dbs1 stonith_admin[15394]:   notice: crm_log_args: Invoked: stonith_admin --reboot fai-dbs2 --tolerance 5s --tag cman 
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:   notice: handle_request: Client stonith_admin.cman.15394.2a97d89d wants to fence (reboot) 'fai-dbs2' with device '(any)'
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for fai-dbs2: bc3f5d73-57bd-4aff-a94c-f9978aa5c3ae (0)
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:   notice: stonith_choose_peer: Couldn't find anyone to fence fai-dbs2 with <any>
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:  warning: get_xpath_object: No match for //@st_delegate in /st-reply
Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:    error: remote_op_done: Operation reboot of fai-dbs2 by fai-dbs1 for stonith_admin.cman.15394 at fai-dbs1.bc3f5d73: No such device
Sep 27 08:51:50 fai-dbs1 crmd[4855]:   notice: tengine_stonith_notify: Peer fai-dbs2 was not terminated (reboot) by fai-dbs1 for fai-dbs1: No such device (ref=bc3f5d73-57bd-4aff-a94c-f9978aa5c3ae) by client stonith_admin.cman.15394
Sep 27 08:51:50 fai-dbs1 fence_pcmk[15393]: Call to fence fai-dbs2 (reset) failed with rc=237

After seeing this one the one cluster, I checked the logs on the other and sure enough I'm seeing the same thing there. As I mentioned, both nodes in both clusters *appear* to be operating correctly. For example, the output of "pcs status" on the small cluster is this:

[root at fai-dbs1 ~]# pcs status
Cluster name: dbs_cluster
Last updated: Tue Sep 27 08:59:44 2016
Last change: Thu Mar  3 06:11:00 2016
Stack: cman
Current DC: fai-dbs1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
1 Resources configured

Online: [ fai-dbs1 fai-dbs2 ]

Full list of resources:

 virtual_ip	(ocf::heartbeat:IPaddr2):	Started fai-dbs1

And on the larger cluster, it has services running across both nodes of the cluster, and I've been able to move stuff back and forth without issue. Both nodes have the stonith-enabled property set to false, and no-quorum-policy set to ignore (since they are only two nodes in the cluster).

What could be causing the log messages? Is the CPU usage normal, or might there be something I can do about that as well? Thanks.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160927/44c74239/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Israel Brewster.vcf
Type: text/directory
Size: 417 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160927/44c74239/attachment-0003.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160927/44c74239/attachment-0007.html>