[ClusterLabs] crmsh configure delete for constraints

Tue Feb 9 14:58:17 EST 2016

Hi,

On Tue, Feb 09, 2016 at 05:15:15PM +0300, Vladislav Bogdanov wrote:
> 09.02.2016 16:31, Kristoffer Grönlund wrote:
> >Vladislav Bogdanov <bubble at hoster-ok.com> writes:
> >
> >>Hi,
> >>
> >>when performing a delete operation, crmsh (2.2.0) having -F tries
> >>to stop passed op arguments and then waits for DC to become idle.
> >>
> >
> >Hi again,
> >
> >I have pushed a fix that only waits for DC if any resources were
> >actually stopped: https://github.com/ClusterLabs/crmsh/commit/164aa48
> 
> Great!
> 
> >
> >>
> >>More, it may be worth checking stop-orphan-resources property and pass stop
> >>work to pacemaker if it is set to true.
> >
> >I am a bit concerned that this might not be 100% reliable. I found an
> >older discussion regarding this and the recommendation from David Vossel
> >then was to always make sure resources were stopped before removing
> >them, and not relying on stop-orphan-resources to clean things up
> >correctly. His example of when this might not work well is when removing
> >a group, as the group members might get stopped out-of-order.
> 
> OK, I agree. That was just an idea.
> 
> >
> >At the same time, I have thought before that the current functionality
> >is not great. Having to stop resources before removing them is if
> >nothing else annoying! I have a tentative change proposal to this where
> >crmsh would stop the resources even if --force is not set, and there
> >would be a flag to pass to stop to get it to ignore whether resources
> >are running, since that may be useful if the resource is misconfigured
> >and the stop action doesn't work.
> 
> That should result in fencing, no? I think that is RA issue if that
> happens.

Right. Unfortunately, this case often gets too little attention;
people typically test with good and working configurations only.
The first time we hear about it is from some annoyed user who's
node got fenced for no good reason. Even worse, with some bad
configurations, it can happen that the nodes get fenced in a
round-robin fashion, which certainly won't make your time very
productive.

> Particularly, imho RAs should not run validate_all on stop
> action.

I'd disagree here. If the environment is no good (bad
installation, missing configuration and similar), then the stop
operation probably won't do much good. Ultimately, it may depend
on how the resource is managed. In ocf-rarun, validate_all is
run, but then the operation is not carried out if the environment
is invalid. In particular, the resource is considered to be
stopped, and the stop operation exits with success. One of the
most common cases is when the software resides on shared
non-parallel storage.

BTW, handling the stop and monitor/probe operations was the
primary motivation to develop ocf-rarun. It's often quite
difficult to get these things right.

Cheers,

Dejan

> Best,
> Vladislav
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org