[ClusterLabs] ocf_take_lock is NOT actually safe to use
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Jun 26 11:05:02 EDT 2017
Hi,
On Wed, Jun 21, 2017 at 04:40:47PM +0200, Lars Ellenberg wrote:
>
> Repost to a wider audience, to raise awareness for this.
> ocf_take_lock may or may not be better than nothing.
>
> It at least "annotates" that the auther would like to protect something
> that is considered a "critical region" of the resource agent.
>
> At the same time, it does NOT deliver what the name seems to imply.
>
Lars, many thanks for the analysis and bringing this up again.
I'm not going to take on the details below, just to say that
there's now a pull request for the issue:
https://github.com/ClusterLabs/resource-agents/pull/995
In short, it consists of reducing the race window size (by using
mkdir*), double test for stale locks, and improved random number
function. I ran numerous tests with and without stale locks and
it seems to hold quite well.
The comments there contain a detailed description of the
approach.
Please review and comment whoever finds time.
Cheers,
Dejan
*) Though the current implementation uses just a file and the
proposed one directories, the locks are short lived and there
shouldn't be problems on upgrades.
> I think I brought this up a few times over the years, but was not noisy
> enough about it, because it seemed not important enough: no-one was
> actually using this anyways.
>
> But since new usage has been recently added with
> [ClusterLabs/resource-agents] targetcli lockfile (#917)
> here goes:
>
> On Wed, Jun 07, 2017 at 02:49:41PM -0700, Dejan Muhamedagic wrote:
> > On Wed, Jun 07, 2017 at 05:52:33AM -0700, Lars Ellenberg wrote:
> > > Note: ocf_take_lock is NOT actually safe to use.
> > >
> > > As implemented, it uses "echo $pid > lockfile" to create the lockfile,
> > > which means if several such "ocf_take_lock" happen at the same time,
> > > they all "succeed", only the last one will be the "visible" one to future waiters.
> >
> > Ugh.
>
> Exactly.
>
> Reproducer:
> #############################################################
> #!/bin/bash
> export OCF_ROOT=/usr/lib/ocf/ ;
> . /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ;
>
> x() (
> ocf_take_lock dummy-lock ;
> ocf_release_lock_on_exit dummy-lock ;
> set -C;
> echo x > protected && sleep 0.15 && rm -f protected || touch BROKEN;
> );
>
> mkdir -p /run/ocf_take_lock_demo
> cd /run/ocf_take_lock_demo
> rm -f BROKEN; i=0;
> time while ! test -e BROKEN; do
> x & x &
> wait;
> i=$(( i+1 ));
> done ;
> test -e BROKEN && echo "reproduced race in $i iterations"
> #############################################################
>
> x() above takes, and, because of the () subshell and
> ocf_release_lock_on_exit, releases the "dummy-lock",
> and within the protected region of code,
> creates and removes a file "protected".
>
> If ocf_take_lock was good, there could never be two instances
> inside the lock, so echo x > protected should never fail.
>
> With the current implementation of ocf_take_lock,
> it takes "just a few" iterations here to reproduce the race.
> (usually within a minute).
>
> The races I see in ocf_take_lock:
> "creation race":
> test -e $lock
> # someone else may create it here
> echo $$ > $lock
> # but we override it with ours anyways
>
> "still empty race":
> test -e $lock # maybe it already exists (open O_CREAT|O_TRUNC)
> # but does not yet contain target pid,
> pid=`cat $lock` # this one is empty,
> kill -0 $pid # and this one fails
> and thus a "just being created" one is considered stale
>
> There are other problems around "stale pid file detection",
> but let's not go into that minefield right now.
>
> > > Maybe we should change it to
> > > ```
> > > while ! ( set -C; echo $pid > lockfile ); do
> > > if test -e lockfile ; then
> > > : error handling for existing lockfile, stale lockfile detection
> > > else
> > > : error handling for not being able to create lockfile
> > > fi
> > > done
> > > : only reached if lockfile was successfully created
> > > ```
> > >
> > > (or use flock or other tools designed for that purpose)
> >
> > flock would probably be the easiest. mkdir would do too, but for
> > upgrade issues.
>
> and, being part of util-linux, flock should be available "everywhere".
>
> but because writing "wrappers" around flock similar to the intended
> semantics of ocf_take_lock and ocf_release_lock_on_exit is not easy
> either, usually you'd be better of using flock directly in the RA.
>
> so, still trying to do this with shell:
>
> "set -C" (respectively set -o noclober):
> If set, disallow existing regular files to be overwritten
> by redirection of output.
>
> normal '>' means: O_WRONLY|O_CREAT|O_TRUNC,
> set -C '>' means: O_WRONLY|O_CREAT|O_EXCL
>
> using "set -C ; echo $$ > $lock" instead of
> "test -e $lock || echo $$ > $lock"
> gets rid of the "creation race".
>
> To get rid of the "still empty race",
> we'd need to play games with hardlinks:
>
> (
> set -C
> success=false
> if echo $$ > .$lock ; then
> ln .$lock $lock && success=true
> rm -f .$lock
> fi
> $success
> )
>
> That should be "good enough",
> and much better than what we have now.
>
>
>
> back to a possible "flock" wrapper,
> maybe something like this:
>
> ocf_synchronize() {
> local lockfile=$1
> shift
> (
> flock -x 8 || exit 1
> ( "$@" ) 8>&-
> ) 8> "$lockfile"
> }
> # and then
> ocf_synchronize my_exclusive_shell_function with some args
>
> As this runs in subshells,
> it would not be able to change any variales visible
> to the rest of the script, which may limit its usefulness.
>
>
> or maybe, have a ocf_rarun_synchronized(),
> that will do (directly from the flock man page):
> [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -e "$0" "$0" "$@" || :
> with nonblocking and timeout variants?
>
>
> I'd very much like some volunteer to step forward
> and implement the actual patch...
>
>
> Cheers,
>
> Lars
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list