[ClusterLabs] Antw: Re: ocf_take_lock is NOT actually safe to use
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Jul 5 04:37:56 EDT 2017
Hi!
Could it be that I pointed out this problem about six years ago? At least I found a locking implementation for OCF RAs dated from 2011. My approach is a bit different, because I use a global MUTEX lock to create the actual lock files inside:
---
# Shell functions to support file-based locking
# (c) 2011 by Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
MUTEX_LOCK=${HA_RSCTMP}/_MUTEX_LOCK_ # lock for mutual exclusion
# create the given lockfile, writing the shell's PID as owner into it
# we assume that file descriptor 123 is not used elsewhere
xola_try_lock()
{
typeset pidfile="$1"
(flock -e 123 &&
if [ -e "$pidfile" ]; then # a lock file exists
typeset locker="$(<"$1")"
if ! kill -0 "$locker" > /dev/null 2>&1; then
ocf_log info "grabbing stale PID lock $pidfile from $locker"
echo $$ > "$pidfile" # overwrite stale lock file, success
else
ocf_log info "PID $locker owns lock $pidfile"
false # valid lock file exists, failure
fi
else
echo $$ > "$pidfile" # create a new lock file, success
fi) 123> "$MUTEX_LOCK"
}
# create the given lock file, possibly waiting until lock file is available
xola_lock()
{
typeset delay=${2:-5} limit=${3:-500} # initial and maximum delay
typeset sdelay # hundreds of a second to wait
while ! xola_try_lock "$1"; do
[ -w "$MUTEX_LOCK" ] || return 1
sdelay=$(printf '%d.%02d' $(expr $delay / 100) $(expr $delay % 100))
ocf_log info "waiting $sdelay seconds before re-trying lock $1"
sleep $sdelay
# use limited exponential backup wait strategy
[ "$delay" -lt "$limit" ] && (( delay += delay / 3 ))
done
}
# check the given lockfile is a valid lock
# we assume that file descriptor 123 is not used elsewhere
xola_check_lock()
{
typeset pidfile="$1"
(flock -e 123 &&
if [ -e "$pidfile" ]; then # a lock file exists
typeset locker="$(<"$pidfile")"
kill -0 "$locker" > /dev/null 2>&1
else
false
fi) 123> "$MUTEX_LOCK"
}
# remove the given lock file unconditionally
# we assume that file descriptor 124 is not used elsewhere
xola_unlock()
{
typeset pidfile="$1"
(flock -e 124 &&
if [ -e "$pidfile" ]; then
typeset locker="$(<"$pidfile")"
if [ "$locker" = $$ ]; then
rm "$pidfile"
else
ocf_log err "xola_unlock: lock $pidfile owned by $locker"
false
fi
else
ocf_log warn "xola_unlock: lock $pidfile was not locked"
false
fi) 124> "$MUTEX_LOCK"
}
---
So far this seems to work ;-)
Regards,
Ulrich
>>> Dejan Muhamedagic <dejanmm at fastmail.fm> schrieb am 26.06.2017 um 17:05 in
Nachricht <20170626150502.GB4588 at tuttle.homenet>:
> Hi,
>
> On Wed, Jun 21, 2017 at 04:40:47PM +0200, Lars Ellenberg wrote:
>>
>> Repost to a wider audience, to raise awareness for this.
>> ocf_take_lock may or may not be better than nothing.
>>
>> It at least "annotates" that the auther would like to protect something
>> that is considered a "critical region" of the resource agent.
>>
>> At the same time, it does NOT deliver what the name seems to imply.
>>
>
> Lars, many thanks for the analysis and bringing this up again.
>
> I'm not going to take on the details below, just to say that
> there's now a pull request for the issue:
>
> https://github.com/ClusterLabs/resource-agents/pull/995
>
> In short, it consists of reducing the race window size (by using
> mkdir*), double test for stale locks, and improved random number
> function. I ran numerous tests with and without stale locks and
> it seems to hold quite well.
>
> The comments there contain a detailed description of the
> approach.
>
> Please review and comment whoever finds time.
>
> Cheers,
>
> Dejan
>
> *) Though the current implementation uses just a file and the
> proposed one directories, the locks are short lived and there
> shouldn't be problems on upgrades.
>
>> I think I brought this up a few times over the years, but was not noisy
>> enough about it, because it seemed not important enough: no-one was
>> actually using this anyways.
>>
>> But since new usage has been recently added with
>> [ClusterLabs/resource-agents] targetcli lockfile (#917)
>> here goes:
>>
>> On Wed, Jun 07, 2017 at 02:49:41PM -0700, Dejan Muhamedagic wrote:
>> > On Wed, Jun 07, 2017 at 05:52:33AM -0700, Lars Ellenberg wrote:
>> > > Note: ocf_take_lock is NOT actually safe to use.
>> > >
>> > > As implemented, it uses "echo $pid > lockfile" to create the lockfile,
>> > > which means if several such "ocf_take_lock" happen at the same time,
>> > > they all "succeed", only the last one will be the "visible" one to future
> waiters.
>> >
>> > Ugh.
>>
>> Exactly.
>>
>> Reproducer:
>> #############################################################
>> #!/bin/bash
>> export OCF_ROOT=/usr/lib/ocf/ ;
>> . /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ;
>>
>> x() (
>> ocf_take_lock dummy-lock ;
>> ocf_release_lock_on_exit dummy-lock ;
>> set -C;
>> echo x > protected && sleep 0.15 && rm -f protected || touch BROKEN;
>> );
>>
>> mkdir -p /run/ocf_take_lock_demo
>> cd /run/ocf_take_lock_demo
>> rm -f BROKEN; i=0;
>> time while ! test -e BROKEN; do
>> x & x &
>> wait;
>> i=$(( i+1 ));
>> done ;
>> test -e BROKEN && echo "reproduced race in $i iterations"
>> #############################################################
>>
>> x() above takes, and, because of the () subshell and
>> ocf_release_lock_on_exit, releases the "dummy-lock",
>> and within the protected region of code,
>> creates and removes a file "protected".
>>
>> If ocf_take_lock was good, there could never be two instances
>> inside the lock, so echo x > protected should never fail.
>>
>> With the current implementation of ocf_take_lock,
>> it takes "just a few" iterations here to reproduce the race.
>> (usually within a minute).
>>
>> The races I see in ocf_take_lock:
>> "creation race":
>> test -e $lock
>> # someone else may create it here
>> echo $$ > $lock
>> # but we override it with ours anyways
>>
>> "still empty race":
>> test -e $lock # maybe it already exists (open O_CREAT|O_TRUNC)
>> # but does not yet contain target pid,
>> pid=`cat $lock` # this one is empty,
>> kill -0 $pid # and this one fails
>> and thus a "just being created" one is considered stale
>>
>> There are other problems around "stale pid file detection",
>> but let's not go into that minefield right now.
>>
>> > > Maybe we should change it to
>> > > ```
>> > > while ! ( set -C; echo $pid > lockfile ); do
>> > > if test -e lockfile ; then
>> > > : error handling for existing lockfile, stale lockfile detection
>> > > else
>> > > : error handling for not being able to create lockfile
>> > > fi
>> > > done
>> > > : only reached if lockfile was successfully created
>> > > ```
>> > >
>> > > (or use flock or other tools designed for that purpose)
>> >
>> > flock would probably be the easiest. mkdir would do too, but for
>> > upgrade issues.
>>
>> and, being part of util-linux, flock should be available "everywhere".
>>
>> but because writing "wrappers" around flock similar to the intended
>> semantics of ocf_take_lock and ocf_release_lock_on_exit is not easy
>> either, usually you'd be better of using flock directly in the RA.
>>
>> so, still trying to do this with shell:
>>
>> "set -C" (respectively set -o noclober):
>> If set, disallow existing regular files to be overwritten
>> by redirection of output.
>>
>> normal '>' means: O_WRONLY|O_CREAT|O_TRUNC,
>> set -C '>' means: O_WRONLY|O_CREAT|O_EXCL
>>
>> using "set -C ; echo $$ > $lock" instead of
>> "test -e $lock || echo $$ > $lock"
>> gets rid of the "creation race".
>>
>> To get rid of the "still empty race",
>> we'd need to play games with hardlinks:
>>
>> (
>> set -C
>> success=false
>> if echo $$ > .$lock ; then
>> ln .$lock $lock && success=true
>> rm -f .$lock
>> fi
>> $success
>> )
>>
>> That should be "good enough",
>> and much better than what we have now.
>>
>>
>>
>> back to a possible "flock" wrapper,
>> maybe something like this:
>>
>> ocf_synchronize() {
>> local lockfile=$1
>> shift
>> (
>> flock -x 8 || exit 1
>> ( "$@" ) 8>&-
>> ) 8> "$lockfile"
>> }
>> # and then
>> ocf_synchronize my_exclusive_shell_function with some args
>>
>> As this runs in subshells,
>> it would not be able to change any variales visible
>> to the rest of the script, which may limit its usefulness.
>>
>>
>> or maybe, have a ocf_rarun_synchronized(),
>> that will do (directly from the flock man page):
>> [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -e "$0" "$0" "$@" ||
> :
>> with nonblocking and timeout variants?
>>
>>
>> I'd very much like some volunteer to step forward
>> and implement the actual patch...
>>
>>
>> Cheers,
>>
>> Lars
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list