[ClusterLabs] ocf_take_lock is NOT actually safe to use
Jan Pokorný
jpokorny at redhat.com
Thu Jun 22 13:26:21 EDT 2017
On 21/06/17 16:40 +0200, Lars Ellenberg wrote:
> Repost to a wider audience, to raise awareness for this.
Appreciated, Lars.
Adding developers ML for possibly even larger outreach.
> ocf_take_lock may or may not be better than nothing.
>
> It at least "annotates" that the auther would like to protect something
> that is considered a "critical region" of the resource agent.
>
> At the same time, it does NOT deliver what the name seems to imply.
>
> I think I brought this up a few times over the years, but was not noisy
> enough about it, because it seemed not important enough: no-one was
> actually using this anyways.
True, I have found this reference (a leaf in the whole thread):
http://lists.linux-ha.org/pipermail/linux-ha-dev/2010-October/017801.html
> But since new usage has been recently added with
> [ClusterLabs/resource-agents] targetcli lockfile (#917)
[linked: https://github.com/ClusterLabs/resource-agents/pull/917]
> here goes:
>
> On Wed, Jun 07, 2017 at 02:49:41PM -0700, Dejan Muhamedagic wrote:
>> On Wed, Jun 07, 2017 at 05:52:33AM -0700, Lars Ellenberg wrote:
>>> Note: ocf_take_lock is NOT actually safe to use.
>>>
>>> As implemented, it uses "echo $pid > lockfile" to create the lockfile,
>>> which means if several such "ocf_take_lock" happen at the same time,
>>> they all "succeed", only the last one will be the "visible" one to future waiters.
>>
>> Ugh.
>
> Exactly.
>
> Reproducer:
> #############################################################
> #!/bin/bash
> export OCF_ROOT=/usr/lib/ocf/ ;
> . /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ;
>
> x() (
> ocf_take_lock dummy-lock ;
> ocf_release_lock_on_exit dummy-lock ;
> set -C;
> echo x > protected && sleep 0.15 && rm -f protected || touch BROKEN;
> );
>
> mkdir -p /run/ocf_take_lock_demo
> cd /run/ocf_take_lock_demo
> rm -f BROKEN; i=0;
> time while ! test -e BROKEN; do
> x & x &
> wait;
> i=$(( i+1 ));
> done ;
> test -e BROKEN && echo "reproduced race in $i iterations"
> #############################################################
>
> x() above takes, and, because of the () subshell and
> ocf_release_lock_on_exit, releases the "dummy-lock",
> and within the protected region of code,
> creates and removes a file "protected".
>
> If ocf_take_lock was good, there could never be two instances
> inside the lock, so echo x > protected should never fail.
>
> With the current implementation of ocf_take_lock,
> it takes "just a few" iterations here to reproduce the race.
> (usually within a minute).
>
> The races I see in ocf_take_lock:
> "creation race":
> test -e $lock
> # someone else may create it here
> echo $$ > $lock
> # but we override it with ours anyways
>
> "still empty race":
> test -e $lock # maybe it already exists (open O_CREAT|O_TRUNC)
> # but does not yet contain target pid,
> pid=`cat $lock` # this one is empty,
> kill -0 $pid # and this one fails
> and thus a "just being created" one is considered stale
>
> There are other problems around "stale pid file detection",
> but let's not go into that minefield right now.
>
>>> Maybe we should change it to
>>> ```
>>> while ! ( set -C; echo $pid > lockfile ); do
>>> if test -e lockfile ; then
>>> : error handling for existing lockfile, stale lockfile detection
>>> else
>>> : error handling for not being able to create lockfile
>>> fi
>>> done
>>> : only reached if lockfile was successfully created
>>> ```
>>>
>>> (or use flock or other tools designed for that purpose)
>>
>> flock would probably be the easiest. mkdir would do too, but for
>> upgrade issues.
>
> and, being part of util-linux, flock should be available "everywhere".
>
> but because writing "wrappers" around flock similar to the intended
> semantics of ocf_take_lock and ocf_release_lock_on_exit is not easy
> either, usually you'd be better of using flock directly in the RA.
>
> so, still trying to do this with shell:
>
> "set -C" (respectively set -o noclober):
> If set, disallow existing regular files to be overwritten
> by redirection of output.
For completeness, also guaranteed with POSIX specification:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/set.html
> normal '>' means: O_WRONLY|O_CREAT|O_TRUNC,
From
https://github.com/ClusterLabs/resource-agents/pull/622#issuecomment-113166800
I actually got an impression that this is shell-specific.
> set -C '>' means: O_WRONLY|O_CREAT|O_EXCL
The only thing I can add at this point (it needs more time to read up
on the proposals) is that this is another con for using "standard"
shell as an implementation language, along with, e.g., being prone
to mishandle whitespaces in the parameters being passed easily:
http://oss.clusterlabs.org/pipermail/users/2015-May/000403.html
> using "set -C ; echo $$ > $lock" instead of
> "test -e $lock || echo $$ > $lock"
> gets rid of the "creation race".
>
> To get rid of the "still empty race",
> we'd need to play games with hardlinks:
>
> (
> set -C
> success=false
> if echo $$ > .$lock ; then
> ln .$lock $lock && success=true
> rm -f .$lock
> fi
> $success
> )
>
> That should be "good enough",
> and much better than what we have now.
>
>
>
> back to a possible "flock" wrapper,
> maybe something like this:
>
> ocf_synchronize() {
> local lockfile=$1
> shift
> (
> flock -x 8 || exit 1
> ( "$@" ) 8>&-
> ) 8> "$lockfile"
> }
> # and then
> ocf_synchronize my_exclusive_shell_function with some args
>
> As this runs in subshells,
> it would not be able to change any variales visible
> to the rest of the script, which may limit its usefulness.
>
>
> or maybe, have a ocf_rarun_synchronized(),
> that will do (directly from the flock man page):
> [ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -e "$0" "$0" "$@" || :
> with nonblocking and timeout variants?
>
>
> I'd very much like some volunteer to step forward
> and implement the actual patch...
In the iterim, perhaps doc/dev-guides/ra-dev-guide.asc could be
updated, stating that the current implementation is not reliable,
which could then change into a statement about the past for those
who are combining old ocf library while consulting the newest
guide available?
--
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170622/54bdd370/attachment-0003.sig>
More information about the Users
mailing list