[ClusterLabs] Antw: Re: crmsh configure delete for constraints

Vladislav Bogdanov bubble at hoster-ok.com
Wed Feb 10 12:01:47 UTC 2016


10.02.2016 13:56, Ferenc Wágner wrote:
> Vladislav Bogdanov <bubble at hoster-ok.com> writes:
>
>> If pacemaker has got an error on start, it will run stop with the same
>> set of parameters anyways. And will get error again if that one was
>> from validation and RA does not differentiate validation for start and
>> stop. And then circular fencing over the whole cluster is triggered
>> for no reason.
>>
>> Of course, for safety, RA could save its state if start was successful
>> and skip validation on stop only if that state is not found. Otherwise
>> removed binary or config file would result in resource running on
>> several nodes.
>
> What would happen if we made the start operation return OCF_NOT_RUNNING

Well, then cluster will try to start it again, and that could be 
undesirable - what are OCF_ERR_INSTALLED and OCF_ERR_CONFIGURED for then?

> if validation fails?  Or more broadly: if the start operation knows that
> the resource is not running, thus a stop opration would do no good.
>  From Pacemaker Explained B.4: "The cluster will not attempt to stop a
> resource that returns this for any action."  The probes could still
> return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
> failure could still lead to fencing, protecting data integrity, but
> circular fencing would not happen.  I hope.
>
> By the way, what are the reasons to run stop after a failed start?  To
> clean up halfway-started resources?  Besides OCF_ERR_GENERIC, the other
> error codes pretty much guarrantee that the resource can not be active.

That heavily depends on how given RA is implemented...





More information about the Users mailing list