[ClusterLabs] crmsh configure delete for constraints

Tue Feb 9 23:39:27 EST 2016

Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>Hi,
>
>On Tue, Feb 09, 2016 at 05:15:15PM +0300, Vladislav Bogdanov wrote:
>> 09.02.2016 16:31, Kristoffer Grönlund wrote:
>> >Vladislav Bogdanov <bubble at hoster-ok.com> writes:
>> >
>> >>Hi,
>> >>
>> >>when performing a delete operation, crmsh (2.2.0) having -F tries
>> >>to stop passed op arguments and then waits for DC to become idle.
>> >>
>> >
>> >Hi again,
>> >
>> >I have pushed a fix that only waits for DC if any resources were
>> >actually stopped:
>https://github.com/ClusterLabs/crmsh/commit/164aa48
>> 
>> Great!
>> 
>> >
>> >>
>> >>More, it may be worth checking stop-orphan-resources property and
>pass stop
>> >>work to pacemaker if it is set to true.
>> >
>> >I am a bit concerned that this might not be 100% reliable. I found
>an
>> >older discussion regarding this and the recommendation from David
>Vossel
>> >then was to always make sure resources were stopped before removing
>> >them, and not relying on stop-orphan-resources to clean things up
>> >correctly. His example of when this might not work well is when
>removing
>> >a group, as the group members might get stopped out-of-order.
>> 
>> OK, I agree. That was just an idea.
>> 
>> >
>> >At the same time, I have thought before that the current
>functionality
>> >is not great. Having to stop resources before removing them is if
>> >nothing else annoying! I have a tentative change proposal to this
>where
>> >crmsh would stop the resources even if --force is not set, and there
>> >would be a flag to pass to stop to get it to ignore whether
>resources
>> >are running, since that may be useful if the resource is
>misconfigured
>> >and the stop action doesn't work.
>> 
>> That should result in fencing, no? I think that is RA issue if that
>> happens.
>
>Right. Unfortunately, this case often gets too little attention;
>people typically test with good and working configurations only.
>The first time we hear about it is from some annoyed user who's
>node got fenced for no good reason. Even worse, with some bad
>configurations, it can happen that the nodes get fenced in a
>round-robin fashion, which certainly won't make your time very
>productive.
>
>> Particularly, imho RAs should not run validate_all on stop
>> action.
>
>I'd disagree here. If the environment is no good (bad
>installation, missing configuration and similar), then the stop
>operation probably won't do much good. Ultimately, it may depend
>on how the resource is managed. In ocf-rarun, validate_all is
>run, but then the operation is not carried out if the environment
>is invalid. In particular, the resource is considered to be
>stopped, and the stop operation exits with success. One of the
>most common cases is when the software resides on shared
>non-parallel storage.

Well, I'd reword. Generally, RA should not exit with error if validation fails on stop.
Is that better?

>
>BTW, handling the stop and monitor/probe operations was the
>primary motivation to develop ocf-rarun. It's often quite
>difficult to get these things right.
>
>Cheers,
>
>Dejan
>
>
>> Best,
>> Vladislav
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>_______________________________________________
>Users mailing list: Users at clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org