[ClusterLabs] stonithd/fenced filling up logs

Israel Brewster israel at ravnalaska.net
Tue Oct 4 23:09:00 UTC 2016


On Oct 4, 2016, at 3:03 PM, Digimer <lists at alteeve.ca> wrote:
> 
> On 04/10/16 06:50 PM, Israel Brewster wrote:
>> On Oct 4, 2016, at 2:26 PM, Ken Gaillot <kgaillot at redhat.com
>> <mailto:kgaillot at redhat.com>> wrote:
>>> 
>>> On 10/04/2016 11:31 AM, Israel Brewster wrote:
>>>> I sent this a week ago, but never got a response, so I'm sending it
>>>> again in the hopes that it just slipped through the cracks. It seems to
>>>> me that this should just be a simple mis-configuration on my part
>>>> causing the issue, but I suppose it could be a bug as well.
>>>> 
>>>> I have two two-node clusters set up using corosync/pacemaker on CentOS
>>>> 6.8. One cluster is simply sharing an IP, while the other one has
>>>> numerous services and IP's set up between the two machines in the
>>>> cluster. Both appear to be working fine. However, I was poking around
>>>> today, and I noticed that on the single IP cluster, corosync, stonithd,
>>>> and fenced were using "significant" amounts of processing power - 25%
>>>> for corosync on the current primary node, with fenced and stonithd often
>>>> showing 1-2% (not horrible, but more than any other process). In looking
>>>> at my logs, I see that they are dumping messages like the following to
>>>> the messages log every second or two:
>>>> 
>>>> Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:  warning: get_xpath_object:
>>>> No match for //@st_delegate in /st-reply
>>>> Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]:   notice: remote_op_done:
>>>> Operation reboot of fai-dbs1 by fai-dbs2 for
>>>> stonith_admin.cman.15835 at fai-dbs2.c5161517: No such device
>>>> Sep 27 08:51:50 fai-dbs1 crmd[4855]:   notice: tengine_stonith_notify:
>>>> Peer fai-dbs1 was not terminated (reboot) by fai-dbs2 for fai-dbs2: No
>>>> such device (ref=c5161517-c0cc-42e5-ac11-1d55f7749b05) by client
>>>> stonith_admin.cman.15835
>>>> Sep 27 08:51:50 fai-dbs1 fence_pcmk[15393]: Requesting Pacemaker fence
>>>> fai-dbs2 (reset)
>>> 
>>> The above shows that CMAN is asking pacemaker to fence a node. Even
>>> though fencing is disabled in pacemaker itself, CMAN is configured to
>>> use pacemaker for fencing (fence_pcmk).
>> 
>> I never did any specific configuring of CMAN, Perhaps that's the
>> problem? I missed some configuration steps on setup? I just followed the
>> directions
>> here: http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs,
>> which disabled stonith in pacemaker via the
>> "pcs property set stonith-enabled=false" command. Is there separate CMAN
>> configs I need to do to get everything copacetic? If so, can you point
>> me to some sort of guide/tutorial for that?
> 
> Disabling stonith is not possible in cman, and very ill advised in
> pacemaker. This is a mistake a lot of "tutorials" make when the author
> doesn't understand the role of fencing.
> 
> In your case, pcs setup cman to use the fence_pcmk "passthrough" fence
> agent, as it should. So when something went wrong, corosync detected it,
> informed cman which then requested pacemaker to fence the peer. With
> pacemaker not having stonith configured and enabled, it could do
> nothing. So pacemaker returned that the fence failed and cman went into
> an infinite loop trying again and again to fence (as it should have).
> 
> You must configure stonith (exactly how depends on your hardware), then
> enable stonith in pacemaker.
> 

Gotcha. There is nothing special about the hardware, it's just two physical boxes connected to the network. So I guess I've got a choice of either a) live with the logging/load situation (since the system does work perfectly as-is other than the excessive logging), or b) spend some time researching stonith to figure out what it does and how to configure it. Thanks for the pointers.

> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list