[ClusterLabs] stonithd/fenced filling up logs

Digimer lists at alteeve.ca
Wed Oct 5 13:19:29 EDT 2016


On 05/10/16 01:14 PM, Dimitri Maziuk wrote:
> On 10/05/2016 11:56 AM, Israel Brewster wrote:
> 
>> As you say, though, this is something I'll simply need to get over if I want 
>> real HA
> 
> The sad truth is making simple stupid stuff that Just Works(tm) is not
> cool. Making stuff that will run a cluster of 1001 randomly mixed
> active, somewhat-active, mostly-passive, etc. nodes, power-off anything
> it doesn't like, when that fails: fence it with the Lights-Out
> Management System Du Jour, when that fails: turn the power off at the
> networked PDUs... and bring you warmed-up slippers in the morning, now
> that's cool.

If you have "1001 randomly mixed ..." services, you might want to break
up your software into smaller clusters. Also, iLO, DRAC, iRMC, RSA...
They're all basically IPMI plus some vendor features. Not sure why you'd
refer to them as "System Du Jour"...

> And when you ask: if there's only two node and one can't talk to the
> other, how does it know that it's the other node and not itself that
> needs to be fenced? The "cool" developers answer: well, we just add a
> delay so they don't try to fence each other at the same time.
> 
> D'oh.
> </rant>

Explain why this is a bad idea, because I don't see anything wrong with it.

> I think your problem is centos 6. Either switch to 7 or ditch pacemaker
> and go heartbeat in haresources mode + mon and a little perl scripting.
> I'm running both, the haresources version. I get about 1 instance of the
> scary split brain per 2 cluster/years and almost all of them are caused
> by me doing something stupid.

That is an insane recommendation. Heartbeat has been deprecated for many
years. There is no plan to restart development, either. Meanwhile,
CentOS/RHEL 6 is perfectly fine and stable and will be supported until
at least 2020.

https://alteeve.ca/w/History_of_HA_Clustering

"scary split brain per 2 cluster/years"

Split-brains are about the worst thing that can happen in HA. At the
very best, you lose your services. At worst, you corrupt your data. Why
risk that at all when fencing solves the problem perfectly fine?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list