[ClusterLabs] ocf_take_lock is NOT actually safe to use

Mon Jun 26 11:05:02 EDT 2017

Hi,

On Wed, Jun 21, 2017 at 04:40:47PM +0200, Lars Ellenberg wrote:
> 
> Repost to a wider audience, to raise awareness for this.
> ocf_take_lock may or may not be better than nothing.
> 
> It at least "annotates" that the auther would like to protect something
> that is considered a "critical region" of the resource agent.
> 
> At the same time, it does NOT deliver what the name seems to imply.
> 

Lars, many thanks for the analysis and bringing this up again.

I'm not going to take on the details below, just to say that
there's now a pull request for the issue:

https://github.com/ClusterLabs/resource-agents/pull/995

In short, it consists of reducing the race window size (by using
mkdir*), double test for stale locks, and improved random number
function. I ran numerous tests with and without stale locks and
it seems to hold quite well.

The comments there contain a detailed description of the
approach.

Please review and comment whoever finds time.

Cheers,

Dejan

*) Though the current implementation uses just a file and the
proposed one directories, the locks are short lived and there
shouldn't be problems on upgrades.

> I think I brought this up a few times over the years, but was not noisy
> enough about it, because it seemed not important enough: no-one was
> actually using this anyways.
> 
> But since new usage has been recently added with
> [ClusterLabs/resource-agents] targetcli lockfile (#917)
> here goes:
> 
> On Wed, Jun 07, 2017 at 02:49:41PM -0700, Dejan Muhamedagic wrote:
> > On Wed, Jun 07, 2017 at 05:52:33AM -0700, Lars Ellenberg wrote:
> > > Note: ocf_take_lock is NOT actually safe to use.
> > > 
> > > As implemented, it uses "echo $pid > lockfile" to create the lockfile,
> > > which means if several such "ocf_take_lock" happen at the same time,
> > > they all "succeed", only the last one will be the "visible" one to future waiters.
> > 
> > Ugh.
> 
> Exactly.
> 
> Reproducer:
> #############################################################
> #!/bin/bash
> export OCF_ROOT=/usr/lib/ocf/ ;
> .  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ;
> 
> x() (
> 	ocf_take_lock dummy-lock ;
> 	ocf_release_lock_on_exit dummy-lock  ;
> 	set -C;
> 	echo x > protected && sleep 0.15 && rm -f protected || touch BROKEN;
> );
> 
> mkdir -p /run/ocf_take_lock_demo
> cd /run/ocf_take_lock_demo
> rm -f BROKEN; i=0;
> time while ! test -e BROKEN; do
> 	x &  x &
> 	wait;
> 	i=$(( i+1 ));
> done ;
> test -e BROKEN && echo "reproduced race in $i iterations"
> #############################################################
> 
> x() above takes, and, because of the () subshell and
> ocf_release_lock_on_exit, releases the "dummy-lock",
> and within the protected region of code,
> creates and removes a file "protected".
> 
> If ocf_take_lock was good, there could never be two instances
> inside the lock, so echo x > protected should never fail.
> 
> With the current implementation of ocf_take_lock,
> it takes "just a few" iterations here to reproduce the race.
> (usually within a minute).
> 
> The races I see in ocf_take_lock:
> "creation race":
> 	test -e $lock
> 		# someone else may create it here
> 	echo $$ > $lock
> 		# but we override it with ours anyways
> 
> "still empty race":
> 	test -e $lock	# maybe it already exists (open O_CREAT|O_TRUNC)
> 			# but does not yet contain target pid,
> 	pid=`cat $lock` # this one is empty,
> 	kill -0 $pid    # and this one fails
> 	and thus a "just being created" one is considered stale
> 
> There are other problems around "stale pid file detection",
> but let's not go into that minefield right now.
> 
> > > Maybe we should change it to 
> > > ```
> > > while ! ( set -C; echo $pid > lockfile ); do
> > >     if test -e lockfile ; then
> > >         : error handling for existing lockfile, stale lockfile detection
> > >     else
> > >         : error handling for not being able to create lockfile
> > >     fi
> > > done
> > > : only reached if lockfile was successfully created
> > > ```
> > > 
> > > (or use flock or other tools designed for that purpose)
> > 
> > flock would probably be the easiest. mkdir would do too, but for
> > upgrade issues.
> 
> and, being part of util-linux, flock should be available "everywhere".
> 
> but because writing "wrappers" around flock similar to the intended
> semantics of ocf_take_lock and ocf_release_lock_on_exit is not easy
> either, usually you'd be better of using flock directly in the RA.
> 
> so, still trying to do this with shell:
> 
> "set -C" (respectively set -o noclober):
> 	If set, disallow existing regular files to be overwritten
> 	by redirection of output.
> 
> normal '>' means: O_WRONLY|O_CREAT|O_TRUNC,
> set -C '>' means: O_WRONLY|O_CREAT|O_EXCL
> 
> using "set -C ; echo $$ > $lock" instead of 
> "test -e $lock || echo $$ > $lock"
> gets rid of the "creation race".
> 
> To get rid of the "still empty race",
> we'd need to play games with hardlinks:
> 
> (
> set -C
> success=false
> if echo $$ > .$lock ; then
> 	ln .$lock $lock && success=true
> 	rm -f .$lock
> fi
> $success
> )
> 
> That should be "good enough",
> and much better than what we have now.
> 
> 
> 
> back to a possible "flock" wrapper,
> maybe something like this:
> 
> ocf_synchronize() {
> 	local lockfile=$1
> 	shift
> 	(
> 		flock -x 8 || exit 1
> 		( "$@" ) 8>&-
> 	) 8> "$lockfile"
> }
> # and then
> ocf_synchronize my_exclusive_shell_function with some args
> 
> As this runs in subshells,
> it would not be able to change any variales visible
> to the rest of the script, which may limit its usefulness.
> 
> 
> or maybe, have a ocf_rarun_synchronized(),
> that will do (directly from the flock man page):
>    [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -e "$0" "$0" "$@" || :
> with nonblocking and timeout variants?
> 
> 
> I'd very much like some volunteer to step forward
> and implement the actual patch...
> 
> 
> Cheers,
> 
> 	Lars
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org