[ClusterLabs] Antw: Re: Q: ordering for a monitoring op only?

Tue Aug 21 15:06:45 UTC 2018

On Tue, 2018-08-21 at 07:49 +0200, Ulrich Windl wrote:
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 20.08.2018 um
> > > > 16:49 in
> 
> Nachricht
> <1534776566.6465.5.camel at redhat.com>:
> > On Mon, 2018‑08‑20 at 10:51 +0200, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > I wonder whether it's possible to run a monitoring op only if
> > > some
> > > specific resource is up.
> > > Background: We have some resource that runs fine without NFS, but
> > > the
> > > start, stop and monitor operations will just hang if NFS is down.
> > > In
> > > effect the monitor operation will time out, the cluster will try
> > > to
> > > recover, calling the stop operation, which in turn will time out,
> > > making things worse (i.e.: causing a node fence).
> > > 
> > > So my idea was to pause the monitoing operation while NFS is down
> > > (NFS itself is controlled by the cluster and should recover
> > > "rather
> > > soon" TM).
> > > 
> > > Is that possible?
> > 
> > A possible mitigation would be to set on‑fail=block on the
> > dependent
> > resource monitor, so if NFS is down, the monitor will still time
> > out,
> > but the cluster will not try to stop it. Of course then you lose
> > the
> > ability to automatically recover from an actual resource failure.
> > 
> > The only other thing I can think of probably wouldn't be reliable:
> > you
> > could put the NFS resource in a group with an
> > ocf:pacemaker:attribute
> > resource. That way, whenever NFS is started, a node attribute will
> > be
> > set, and whenever NFS is stopped, the attribute will be unset.
> > Then,
> > you can set a rule using that attribute. For example you could make
> > the
> > dependent resource's is‑managed property depend on the node
> > attribute
> > value. The reason I think it wouldn't be reliable is that if NFS
> > failed, there would be some time before the cluster stopped the NFS
> > resource and updated the node attribute, and the dependent resource
> > monitor could run during that time. But it would at least diminish
> > the
> > problem space.
> 
> Hi!
> 
> That sounds interesting, even though it's still a work-around and not
> the
> solution for the original problem. Could you show a sketch of the
> mechanism:
> How to set the attribute with the resource, and how to make the
> monitor
> operation depend on it?

ocf:pacemaker:attribute by default creates an attribute named "opa-
<rscname>" with a value of "1" when the resource is started and "0"
when it is stopped (though note that the attribute will not exist
before the first time it is started).

Therefore by creating a group of the NFS resource and an attribute
resource, the attribute will be set to 1 immediately after the NFS
resource is started, and to 0 immediately before the NFS resource is
stopped.

Conceptually:

  create resource rsc-nfs
  create resource rsc-attr
  create group rsc-group = rsc-nfs rsc-attr
  create resource rsc-dependent
     meta-attributes
        rule: opa-rsc-attr=0 -> is-managed=false
        default rule: is-managed=true
  asymmetric order start rsc-group then start rsc-dependent

With that, at start-up, rsc-dependent will wait for rsc-group to start
due to the order constraint (that takes care of the initial case where
the attribute does not yet exist). Because it's asymmetric, rsc-group
can fail or stop without affecting it.

If rsc-group fails (or is disabled), it will be stopped, at which point
the attribute will go to 0, and rsc-dependent will become unmanaged.
The monitor will still run but will be ignored. If the monitor happens
to run (and complete) between when NFS actually fails and when the
cluster stops rsc-group, there will still be trouble. But otherwise,
the cluster will ignore rsc-dependent's monitor failures.

The advantage of that over on-fail=block is that as long as rsc-group
is running, the cluster will still recover rsc-dependent if it fails.
Only when rsc-group is down will the cluster ignore rsc-dependent.

> > Probably any dynamic solution would have a similar race condition
> > ‑‑
> > the NFS will be failed in reality for some amount of time before
> > the
> > cluster detects the failure, so the cluster could never prevent the
> > monitor from running during that window.
> 
> I agree completely.
> 
> Regards,
> Ulrich
> 
> > 
> > > And before you ask: No, I have not written that RA that has the
> > > problem; a multi‑million‑dollar company wrote it (Years before I
> > > had
> > > written a monitor for HP‑UX' cluster that did not have this
> > > problem,
> > > even though the configuration files were read from NFS (It's not
> > > magic: Just periodically copy them to shared memory, and read the
> > > config from shared memory).
> > > 
> > > Regards,
> > > Ulrich
-- 
Ken Gaillot <kgaillot at redhat.com>