[ClusterLabs] Antw: Re: crmsh configure delete for constraints
Vladislav Bogdanov
bubble at hoster-ok.com
Wed Feb 10 12:01:47 UTC 2016
10.02.2016 13:56, Ferenc Wágner wrote:
> Vladislav Bogdanov <bubble at hoster-ok.com> writes:
>
>> If pacemaker has got an error on start, it will run stop with the same
>> set of parameters anyways. And will get error again if that one was
>> from validation and RA does not differentiate validation for start and
>> stop. And then circular fencing over the whole cluster is triggered
>> for no reason.
>>
>> Of course, for safety, RA could save its state if start was successful
>> and skip validation on stop only if that state is not found. Otherwise
>> removed binary or config file would result in resource running on
>> several nodes.
>
> What would happen if we made the start operation return OCF_NOT_RUNNING
Well, then cluster will try to start it again, and that could be
undesirable - what are OCF_ERR_INSTALLED and OCF_ERR_CONFIGURED for then?
> if validation fails? Or more broadly: if the start operation knows that
> the resource is not running, thus a stop opration would do no good.
> From Pacemaker Explained B.4: "The cluster will not attempt to stop a
> resource that returns this for any action." The probes could still
> return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
> failure could still lead to fencing, protecting data integrity, but
> circular fencing would not happen. I hope.
>
> By the way, what are the reasons to run stop after a failed start? To
> clean up halfway-started resources? Besides OCF_ERR_GENERIC, the other
> error codes pretty much guarrantee that the resource can not be active.
That heavily depends on how given RA is implemented...
More information about the Users
mailing list