[ClusterLabs] crmsh configure delete for constraints

Wed Feb 10 12:50:32 EST 2016

On Wed, Feb 10, 2016 at 12:06:34PM +0100, Ferenc Wágner wrote:
> Dejan Muhamedagic <dejanmm at fastmail.fm> writes:
> 
> > If the environment is no good (bad installation, missing configuration
> > and similar), then the stop operation probably won't do much good.
> 
> Agreed.  It may not even know how to probe it.
> 
> > In ocf-rarun, validate_all is run, but then the operation is not
> > carried out if the environment is invalid. In particular, the resource
> > is considered to be stopped, and the stop operation exits with
> > success.
> 
> This sounds dangerous.  What if the local configuration of a node gets
> damaged while a resource is running on it?

I understand your worry, but cannot imagine how that could
happen, unless in case of a more serious failure such as disk
crash, which, the failure, should really cause fencing at another
level.

The most common case, by far, is some mistake or omission during
cluster setup. Humans tend to make mistakes. As Vladislav wrote
elsewhere in this thread, this can cause a fencing loop, which is
no fun, in particular if pacemaker is set to start on boot. It
happened to me a few times and I guess I don't need to describe
the intensity of my feelings toward computers in general and the
cluster stack in particular (not to mention the RA author).

> Eventually the cluster may
> try to stop it, think that it succeeded and start the resource on
> another node.  Now you have two instances running.  Or is the resource
> probed on each node before the start?

No, I don't think so. The probes are run only on crmd start.

> Can a probe failure save your day
> here?  Or do you only mean resource parameters by "environment" (which
> should be identical on each host, so validation would fail everywhere)?

The validation typically checks the configuration and then
whether various files (programs) and directories exist, sometimes
if directories are writable. There could be more, but at least I
would prefer to stop here.

Anyway, we could introduce something like optional
emergency_stop() which would be invoked in ocf-rarun in case the
validation failed. And/or say a RUN_STOP_ANYWAY variable which
would allow stop to be run regardless. But note that it is
extremely difficult to prove or make sure that executing RA
_after_ the validate step failed is going to produce meaningful
results.  In addition, there could also be
FENCE_ON_INVALID_ENVIRONMENT (to be set by the user) for the very
paranoid ;-)

Cheers,

Dejan

> -- 
> Thanks,
> Feri.
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org