[ClusterLabs] epic fail

Mon Jul 24 10:40:39 EDT 2017

On 23/07/17 14:40 +0200, Valentin Vidic wrote:
> On Sun, Jul 23, 2017 at 07:27:03AM -0500, Dmitri Maziuk wrote:
>> So yesterday I ran yum update that puled in the new pacemaker and tried to
>> restart it. The node went into its usual "can't unmount drbd because kernel
>> is using it" and got stonith'ed in the middle of yum transaction. The end
>> result: DRBD reports split brain, HA daemons don't start on boot, RPM
>> database is FUBAR. I've had enough. I'm rebuilding this cluster as centos 6
>> + heartbeat R1.
> 
> It seems you did not put the node into standby before the upgrade as it
> still had resources running.  What was the old/new pacemaker version there?

Thinking out loud, it shouldn't be too hard to deliver an RPM
plugin[1] with RPM-shipped pacemaker (it doesn't make much sense
otherwise) that will hook into RPM transactions, putting the node
into standby first so to cover the corner case one updates the
live cluster.  Something akin to systemd_inhibit.so.

Would there be an interest, though?  And would that be meaningful?

[1] http://rpm.org/devel_doc/plugins.html

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170724/4f0511da/attachment-0003.sig>