[ClusterLabs] Antw: Re: Antw: Re: crmsh configure delete for constraints

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Feb 10 07:55:44 EST 2016


>>> Ferenc Wágner <wferi at niif.hu> schrieb am 10.02.2016 um 11:56 in Nachricht
<87mvr8n896.fsf at lant.ki.iif.hu>:
> Vladislav Bogdanov <bubble at hoster-ok.com> writes:
> 
>> If pacemaker has got an error on start, it will run stop with the same
>> set of parameters anyways. And will get error again if that one was
>> from validation and RA does not differentiate validation for start and
>> stop. And then circular fencing over the whole cluster is triggered
>> for no reason.
>>
>> Of course, for safety, RA could save its state if start was successful
>> and skip validation on stop only if that state is not found. Otherwise
>> removed binary or config file would result in resource running on
>> several nodes.
> 
> What would happen if we made the start operation return OCF_NOT_RUNNING
> if validation fails?  Or more broadly: if the start operation knows that

I think this should NOT be done, because actually the RA doesn't know (most
likely). You are trying to  reduce the impact of one problem by introducing
another problem (returning an incorrect exit code).

> the resource is not running, thus a stop opration would do no good.

If the configuration is NOT correct the cluster should neither try to start or
stop the resource. Maybe the cluster should remember that bad state until the
operator does a cleanup of the problem.

> From Pacemaker Explained B.4: "The cluster will not attempt to stop a
> resource that returns this for any action."  The probes could still
> return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
> failure could still lead to fencing, protecting data integrity, but
> circular fencing would not happen.  I hope.
> 
> By the way, what are the reasons to run stop after a failed start?  To

Probably as the start operation is not required to be atomic, that is, the
resource could partially be started. Stop ensures the resource is completely
stopped (or otherwise fencing will do that).

> clean up halfway-started resources?  Besides OCF_ERR_GENERIC, the other
> error codes pretty much guarrantee that the resource can not be active.
> -- 
> Regards,
> Feri.
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list