[Pacemaker] [Linux-HA] Antw: Re: Q: unmanaged MD-RAID & auto-recovery
Lars Ellenberg
lars.ellenberg at linbit.com
Fri Nov 25 19:31:16 UTC 2011
On Fri, Nov 25, 2011 at 01:54:33PM +0100, Florian Haas wrote:
> On 11/25/11 13:29, Lars Ellenberg wrote:
> >> From the log snippet it's
> >> not entirely clear whether that's a recurring monitor (interval ==
> >> whatever you configured, or 20 if default), or a probe (interval == 0).
> >>
> >> A recurring monitor clearly should not happen at all when unmanaged.
> >
> > That is incorrect.
> >
> > is-managed=false does still monitor the resource. It only prevents
> > pacemaker from sending start/stop etc commands to that resource.
>
> My understanding was that only probes would still occur (on
> cluster-recheck-interval, or when new nodes joined the cluster). And I
> maintain that that would be the intuitively "correct" behavior for
> unmanaged resources. Andrew?
Well, your understanding or intuition seem to misguide you this time.
But if you think I make shit up ;-)
http://www.gossamer-threads.com/lists/linuxha/pacemaker/70606#70606
> > If the implementation of the monitor action in the RA does trigger
> > "auto-recovery" or other things, well, then it does.
>
> Which seems to operate on the same assumption, really, that an unmanaged
> resource never has its monitor action executed.
>
> I still think that this attempt to auto-recover from _within_ the
> monitor action is a bit insane, but maybe lmb (who implemented that
> part, as per git blame) would be able to share his thoughts as to why he
> did it that way.
Well, that's the only place where an auto-recovery of a degraded
(not yet failed!) md array can be triggered from pacemaker.
There is no $OCF_DEGRADED status code,
and no "try-resource-internal-recovery" action.
And if there was, what else could it do?
If you rather have some external monitoring page an operator
to then log in and do the same actions...
If you do md over "long distance" iSCSI (e.g.),
and you lose one of the links, md will detach that leg.
If the link comes back, this is where it then could recover,
and start to resync.
Besides, you explicitly have to request this behaviour of the RA.
I think that approach is perfectly sane.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the Pacemaker
mailing list