[ClusterLabs] Antw: Re: Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

Fri Oct 7 06:18:32 UTC 2016

>>> Dimitri Maziuk <dmaziuk at bmrb.wisc.edu> schrieb am 06.10.2016 um 18:02 in
Nachricht <696bb029-2b44-aa4b-322e-b399dd7416aa at bmrb.wisc.edu>:
> On 10/06/2016 09:26 AM, Klaus Wenninger wrote:
> 
>> Usually one - at least me so far - would rather think that having
>> the awareness of redundany/cluster as high up as possible in the
>> protocol/application-stack would open up possibilities for more
>> appropriate reactions.
> 
> The obvious counter-example is a hard disk failure: they're common on
> commodity spinning rust drives and they're cheap and easy to handle at
> lower level by throwing in a 2nd one in mdadm raid-1.

Just for the records: We had a brand new non-spinning non-rusty SSD disk failing within one week of being used...

Any hardware may fail at any time. We even had an onboard NIC, that stopped operating correctly some day, we had CPU chache errors, RAM parity errors, PCI bus errors, and everything you can imagine.

> 
> -- 
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu