[ClusterLabs] op stop timeout update causes monitor op to fail?
dennisml at conversis.de
Tue Sep 17 15:41:30 EDT 2019
On 11.09.19 16:51, Ken Gaillot wrote:
> On Tue, 2019-09-10 at 09:54 +0200, Dennis Jacobfeuerborn wrote:
>> I just updated the timeout for the stop operation on an nfs cluster
>> while the timeout was update the status suddenly showed this:
>> Failed Actions:
>> * nfsserver_monitor_10000 on nfs1aqs1 'unknown error' (1): call=41,
>> status=Timed Out, exitreason='none',
>> last-rc-change='Tue Aug 13 14:14:28 2019', queued=0ms, exec=0ms
> Are you sure it wasn't already showing that? The timestamp of that
> error is Aug 13, while the logs show the timeout update happening Sep
I'm fairly certain. I did a "pcs status" before that operation to check
the state of the cluster.
> Old errors will keep showing up in status until you manually clean them
> up (with crm_resource --cleanup or a higher-level tool equivalent), or
> any configured failure-timeout is reached.
> In any case, the log excerpt shows that nothing went wrong during the
> time it covers. There were no actions scheduled in that transition in
> response to the timeout change (which is as expected).
What about this line:
pengine: warning: unpack_rsc_op_failure: Processing failed op monitor
for nfsserver on nfs1aqs1: unknown error (1)
I cleaned up the error and tried this again and this time it worked. The
corresponding line in the log now reads:
pengine: info: determine_op_status: Operation monitor found resource
nfsserver active on nfs1aqs1
What I'm wondering is if this could be a race condition of pacemaker
updating the resource and the monitor operation.
More information about the Users