[ClusterLabs] Antw: Re: Q: ordering for a monitoring op only?

Thu Aug 23 10:59:34 UTC 2018

>>> Ryan Thomas <developmentrjt at gmail.com> schrieb am 21.08.2018 um 17:38 in
Nachricht
<CAE_QAjk4gNNAbLWa-WEp-x_4nu640Y8dHbp3D3QCZ_RDqC7e8w at mail.gmail.com>:
> You could accomplish this be creating a custom RA which normally acts as a
> pass-through and calls the "real" RA.  However, it intercepts "monitor"
> actions, checks nfs, and if nfs is down it returns success, otherwise it
> passes though the monitor action to the real RA.  If nfs fails the monitor
> action is in-flight, the customer RA can intercept the failure, check if
> nfs is down, and if so change the failure to a success.

Hi!

This sounds like an interesting approach, but I wonder how to avoid a monitoring timeout: I.e. what value to return when NFS is down? I'm missing a return value like CANNOT_CHECK_AT_THE_MOMENT_SO_PLEASE_ASSUME_RESOURCE_STILL_HAS_ITS_LAST_STATE ;-)

Unless I can return such a value, the wrapper RA will have to wait (possibly causing a timeout). OK, the wrapper RA could cache its last return value and reuse that when NFS is down.

Regards,
Ulrich

> 
> On Mon, Aug 20, 2018 at 3:51 AM Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
>> Hi!
>>
>> I wonder whether it's possible to run a monitoring op only if some
>> specific resource is up.
>> Background: We have some resource that runs fine without NFS, but the
>> start, stop and monitor operations will just hang if NFS is down. In effect
>> the monitor operation will time out, the cluster will try to recover,
>> calling the stop operation, which in turn will time out, making things
>> worse (i.e.: causing a node fence).
>>
>> So my idea was to pause the monitoing operation while NFS is down (NFS
>> itself is controlled by the cluster and should recover "rather soon" TM).
>>
>> Is that possible?
>> And before you ask: No, I have not written that RA that has the problem; a
>> multi-million-dollar company wrote it (Years before I had written a monitor
>> for HP-UX' cluster that did not have this problem, even though the
>> configuration files were read from NFS (It's not magic: Just periodically
>> copy them to shared memory, and read the config from shared memory).
>>
>> Regards,
>> Ulrich
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>>