[ClusterLabs] Antw: Re: epic fail

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Jul 25 06:17:14 UTC 2017


>>> Jan Pokorný <jpokorny at redhat.com> schrieb am 24.07.2017 um 16:40 in
Nachricht
<20170724144039.GC31913 at redhat.com>:
> On 23/07/17 14:40 +0200, Valentin Vidic wrote:
>> On Sun, Jul 23, 2017 at 07:27:03AM -0500, Dmitri Maziuk wrote:
>>> So yesterday I ran yum update that puled in the new pacemaker and tried
to
>>> restart it. The node went into its usual "can't unmount drbd because
kernel
>>> is using it" and got stonith'ed in the middle of yum transaction. The end
>>> result: DRBD reports split brain, HA daemons don't start on boot, RPM
>>> database is FUBAR. I've had enough. I'm rebuilding this cluster as centos
6
>>> + heartbeat R1.
>> 
>> It seems you did not put the node into standby before the upgrade as it
>> still had resources running.  What was the old/new pacemaker version
there?
> 
> Thinking out loud, it shouldn't be too hard to deliver an RPM
> plugin[1] with RPM-shipped pacemaker (it doesn't make much sense
> otherwise) that will hook into RPM transactions, putting the node
> into standby first so to cover the corner case one updates the
> live cluster.  Something akin to systemd_inhibit.so.

While possible, the simple objection is: If someone who is installing software
on a cluster node has no idea how the cluster works, he should lear the hard
way.
Also putting a node in standby can take significant time where you don't see
any progress on the console. People who have no idea how the cluster works
might think the process is hanging or waiting for some input. And finally: I
perefer to stop the whole node before updating the cluster software, instead of
setting the node to standby. So please let the user decide what to do, not the
RPM. And at the very last: If the RPM puts the node into standby (if it isn't
already!), it should also put the node online again (if it had been online
before) to support users that have no idea how the cluster works.

So obviously I don't like the idea. Maybe do it the HP-UX Service Guard way:
Refuse to install/update cluster software if the node is active.

Regards,
Ulrich

> 
> Would there be an interest, though?  And would that be meaningful?
> 
> [1] http://rpm.org/devel_doc/plugins.html 
> 
> -- 
> Jan (Poki)







More information about the Users mailing list