[ClusterLabs] epic fail

Mon Jul 24 15:38:40 UTC 2017

On Mon, 2017-07-24 at 17:13 +0200, Kristián Feldsam wrote:
> Hmm, so when you know, that it happens also when putting node standy,
> them why you run yum update on live cluster, it must be clear that
> node will be fenced.

Standby is not necessary, it's just a cautious step that allows the
admin to verify that all resources moved off correctly. The restart that
yum does should be sufficient for pacemaker to move everything.

A restart shouldn't lead to fencing in any case where something's not
going seriously wrong. I'm not familiar with the "kernel is using it"
message, I haven't run into that before.

The only case where special handling was needed before a yum update is a
node running pacemaker_remote instead of the full cluster stack, before
pacemaker 1.1.15.

> Would you post your pacemaker config? + some logs?
> 
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: support at feldhost.cz
> 
> www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové
> služby za adekvátní ceny.
> 
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
> 
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010 0000 0024 0033 0446
> 
> > On 24 Jul 2017, at 17:04, Dimitri Maziuk <dmaziuk at bmrb.wisc.edu>
> > wrote:
> > 
> > On 07/24/2017 09:40 AM, Jan Pokorný wrote:
> > 
> > > Would there be an interest, though?  And would that be meaningful?
> > 
> > IMO the only reason to put a node in standby is if you want to
> > reboot
> > the active node with no service interruption. For anything else,
> > including a reboot with service interruption (during maintenance
> > window), it's a no.
> > 
> > This is akin to "your mouse has moved, windows needs to be
> > restarted".
> > Except the mouse thing is a joke whereas those "standby" clowns
> > appear
> > to be serious.
> > 
> > With this particular failure, something in the Redhat patched kernel
> > (NFS?) does not release the DRBD filesystem. It happens when I put
> > the
> > node in standby as well, the only difference is not messing up the
> > RPM
> > database which isn't that hard to fix. Since I have several centos 6
> > +
> > DRBD + NFS + heartbeat R1 pairs running happily for years, I have to
> > conclude that centos 7 is simply the wrong tool for this particular
> > job.
> > 
> > -- 
> > Dimitri Maziuk
> > Programmer/sysadmin
> > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu