[ClusterLabs] Antw: Re: crmsh configure delete for constraints

Ferenc Wágner wferi at niif.hu
Wed Feb 10 10:56:53 UTC 2016


Vladislav Bogdanov <bubble at hoster-ok.com> writes:

> If pacemaker has got an error on start, it will run stop with the same
> set of parameters anyways. And will get error again if that one was
> from validation and RA does not differentiate validation for start and
> stop. And then circular fencing over the whole cluster is triggered
> for no reason.
>
> Of course, for safety, RA could save its state if start was successful
> and skip validation on stop only if that state is not found. Otherwise
> removed binary or config file would result in resource running on
> several nodes.

What would happen if we made the start operation return OCF_NOT_RUNNING
if validation fails?  Or more broadly: if the start operation knows that
the resource is not running, thus a stop opration would do no good.
>From Pacemaker Explained B.4: "The cluster will not attempt to stop a
resource that returns this for any action."  The probes could still
return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
failure could still lead to fencing, protecting data integrity, but
circular fencing would not happen.  I hope.

By the way, what are the reasons to run stop after a failed start?  To
clean up halfway-started resources?  Besides OCF_ERR_GENERIC, the other
error codes pretty much guarrantee that the resource can not be active.
-- 
Regards,
Feri.




More information about the Users mailing list