[Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

Michael Schwartzkopff ms at sys4.de
Fri Dec 6 10:50:19 EST 2013

Am Freitag, 6. Dezember 2013, 10:11:07 schrieb Patrick Hemmer:
> I have a resource which updates DNS records (Amazon's Route53). When it
> performs it's `monitor` action, it can sometimes fail because of issues
> with Amazon's API. So I want failures to be ignored for the monitor
> action, and so I set `op monitor on-fail=ignore`. However now when the
> monitor action comes back as 'stopped', pacemaker does nothing. In my
> opinion a "stopped" return code should not be a failure condition, and
> thus the `on-fail=ignore` should not apply. It basically makes the
> monitor option completely useless. It won't do anything on failure, it
> won't do anything on stopped, so you might as well not have a monitor
> action at all.
> If this is a bug I can create a bug report, just not sure if this is
> deliberate or not.

This is not bug but expected behaviour. A monitoring operation for a started 
resource interpretes everything besides "Started" as failure. Also if your 
resource is stopped.

And you told the resoure to ignore failures.

It would be better to improve your resource agent to detect error conditions. 
It could read the state it should be in from pacemaker and compare it with the 

Or, the easy way out, make the migration-threshold large (+INF) and add 
failure-timeout to your resource. So you allow some failures of your resource, 
but forgt the failures after some time.

Of course improving the RA would be the best way.


Michael Schwartzkopff

[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131206/6c12880f/attachment-0003.sig>

More information about the Pacemaker mailing list