[ClusterLabs] ocf scripts shell and local variables

Ken Gaillot kgaillot at redhat.com
Wed Aug 31 16:31:05 UTC 2016


On 08/30/2016 05:46 AM, Gabriele Bulfon wrote:
> illumos (and Solaris 11) delivers ksh93, that is fully Bourn compatible,
> but not with the bash extension of "local" variables, that is not Bourn
> shell. It is supported in ksh93 with the "typedef" operator, instead of
> "local".

"local" isn't Bourne or POSIX, but it isn't a bash extension either.
Apparently, it was introduced by the original Almquist shell (ash), and
so it is supported by both bash and dash. zsh also supports local, and
mksh and OpenBSD ksh have a built-in alias for local='typeset'. Vanilla
ksh (used by Solaris and derivatives) is the only shell in general use
as /bin/sh that doesn't support it.

Unfortunately, there is no standard way to locally scope a shell
variable, and no simple, readable way to do it in a way that runs on
both *ash and vanilla ksh.

> This is used inside the "ocf-*" scripts.
> 
> Gabriele
> 
> ----------------------------------------------------------------------------------------
> *Sonicle S.r.l. *: http://www.sonicle.com <http://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
> 
> 
> 
> ----------------------------------------------------------------------------------
> 
> Da: Dejan Muhamedagic <dejanmm at fastmail.fm>
> A: gbulfon at sonicle.com Cluster Labs - All topics related to open-source
> clustering welcomed <users at clusterlabs.org>
> Data: 30 agosto 2016 12.20.19 CEST
> Oggetto: Re: [ClusterLabs] ocf scripts shell and local variables
> 
>     Hi,
> 
>     On Mon, Aug 29, 2016 at 05:08:35PM +0200, Gabriele Bulfon wrote:
>     > Sure, infact I can change all shebang to point to /bin/bash and
>     it's ok.
>     > The question is about current shebang /bin/sh which may go into
>     trouble (as if one would point to a generic python but uses many
>     specific features of a version of python).
>     > Also, the question is about bash being a good option for RAs,
>     being much more heavy.
> 
>     I'd really suggest installing a smaller shell such as /bin/dash
>     and using that as /bin/sh. Isn't there a Bourne shell in Solaris?
>     If you modify the RAs it could be trouble on subsequent updates.
> 
>     Thanks,
> 
>     Dejan
> 
>     > Gabriele
>     >
>     ----------------------------------------------------------------------------------------
>     > Sonicle S.r.l.
>     > :
>     > http://www.sonicle.com
>     > Music:
>     > http://www.gabrielebulfon.com
>     > Quantum Mechanics :
>     > http://www.cdbaby.com/cd/gabrielebulfon
>     >
>     ----------------------------------------------------------------------------------
>     > Da: Dejan Muhamedagic
>     > A: kgaillot at redhat.com Cluster Labs - All topics related to
>     open-source clustering welcomed
>     > Data: 29 agosto 2016 16.43.52 CEST
>     > Oggetto: Re: [ClusterLabs] ocf scripts shell and local variables
>     > Hi,
>     > On Mon, Aug 29, 2016 at 08:47:43AM -0500, Ken Gaillot wrote:
>     > On 08/29/2016 04:17 AM, Gabriele Bulfon wrote:
>     > Hi Ken,
>     > I have been talking with the illumos guys about the shell problem.
>     > They all agreed that ksh (and specially the ksh93 used in illumos) is
>     > absolutely Bourne-compatible, and that the "local" variables used
>     in the
>     > ocf shells is not a Bourne syntax, but probably a bash specific.
>     > This means that pointing the scripts to "#!/bin/sh" is portable as
>     long
>     > as the scripts are really Bourne-shell only syntax, as any Unix
>     variant
>     > may link whatever Bourne-shell they like.
>     > In this case, it should point to "#!/bin/bash" or whatever shell the
>     > script was written for.
>     > Also, in this case, the starting point is not the ocf-* script,
>     but the
>     > original RA (IPaddr, but almost all of them).
>     > What about making the code base of RA and ocf-* portable?
>     > It may be just by changing them to point to bash, or with some kind of
>     > configure modifier to be able to specify the shell to use.
>     > Meanwhile, changing the scripts by hands into #!/bin/bash worked
>     like a
>     > charm, and I will start patching.
>     > Gabriele
>     > Interesting, I thought local was posix, but it's not. It seems
>     everyone
>     > but solaris implemented it:
>     >
>     http://stackoverflow.com/questions/18597697/posix-compliant-way-to-scope-variables-to-a-function-in-a-shell-script
>     > Please open an issue at:
>     > https://github.com/ClusterLabs/resource-agents/issues
>     > The simplest solution would be to require #!/bin/bash for all RAs that
>     > use local,
>     > This issue was raised many times, but note that /bin/bash is a
>     > shell not famous for being lean: it's great for interactive use,
>     > but not so great if you need to run a number of scripts. The
>     > complexity in bash, which is superfluous for our use case,
>     > doesn't go well with the basic principles of HA clusters.
>     > but I'm not sure that's fair to the distros that support
>     > local in a non-bash default shell. Another possibility would be to
>     > modify all RAs to avoid local entirely, by using unique variable
>     > prefixes per function.
>     > I doubt that we could do a moderately complex shell scripts
>     > without capability of limiting the variables' scope and retaining
>     > sanity at the same time.
>     > Or, it may be possible to guard every instance of
>     > local with a check for ksh, which would use typeset instead.
>     Raising the
>     > issue will allow some discussion of the possibilities.
>     > Just to mention that this is the first time someone reported
>     > running a shell which doesn't support local. Perhaps there's an
>     > option that they install a shell which does.
>     > Thanks,
>     > Dejan
>     >
>     ----------------------------------------------------------------------------------------
>     > *Sonicle S.r.l. *: http://www.sonicle.com
>     > *Music: *http://www.gabrielebulfon.com
>     > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>     >
>     ----------------------------------------------------------------------------------
>     > Da: Ken Gaillot
>     > A: gbulfon at sonicle.com Cluster Labs - All topics related to
>     open-source
>     > clustering welcomed
>     > Data: 26 agosto 2016 15.56.02 CEST
>     > Oggetto: Re: ocf scripts shell and local variables
>     > On 08/26/2016 08:11 AM, Gabriele Bulfon wrote:
>     > I tried adding some debug in ocf-shellfuncs, showing env and ps
>     > -ef into
>     > the corosync.log
>     > I suspect it's always using ksh, because in the env output I
>     > produced I
>     > find this: KSH_VERSION=.sh.version
>     > This is normally not present in the environment, unless ksh is running
>     > the shell.
>     > The RAs typically start with #!/bin/sh, so whatever that points to on
>     > your system is what will be used.
>     > I also tried modifiying all ocf shells with "#!/usr/bin/bash" at the
>     > beginning, no way, same output.
>     > You'd have to change the RA that includes them.
>     > Any idea how can I change the used shell to support "local" variables?
>     > You can either edit the #!/bin/sh line at the top of each RA, or
>     figure
>     > out how to point /bin/sh to a Bourne-compatible shell. ksh isn't
>     > Bourne-compatible, so I'd expect lots of #!/bin/sh scripts to fail
>     with
>     > it as the default shell.
>     > Gabriele
>     >
>     ----------------------------------------------------------------------------------------
>     > *Sonicle S.r.l. *: http://www.sonicle.com
>     > *Music: *http://www.gabrielebulfon.com
>     > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>     >
>     ------------------------------------------------------------------------
>     > *Da:* Gabriele Bulfon
>     > *A:* kgaillot at redhat.com Cluster Labs - All topics related to
>     > open-source clustering welcomed
>     > *Data:* 26 agosto 2016 10.12.13 CEST
>     > *Oggetto:* Re: [ClusterLabs] ocf::heartbeat:IPaddr
>     > I looked around what you suggested, inside ocf-binaris and
>     > ocf-shellfuncs etc.
>     > So I found also these logs in corosync.log :
>     > Aug 25 17:50:33 [2250] crmd: notice: process_lrm_event:
>     > xstorage1-xstorage2_wan2_IP_start_0:22 [
>     > /usr/lib/ocf/resource.d/heartbeat/IPaddr[71]: local: not found [No
>     > such file or
>     > directory]\n/usr/lib/ocf/resource.d/heartbeat/IPaddr[354]: local:
>     > not found [No such file or
>     > directory]\n/usr/lib/ocf/resource.d/heartbeat/IPaddr[355]: local:
>     > not found [No such file or
>     > directory]\n/usr/lib/ocf/resource.d/heartbeat/IPaddr[356]: local:
>     > not found [No such file or directory]\nocf-exit-reason:Setup
>     > problem: coul
>     > Aug 25 17:50:33 [2246] lrmd: notice: operation_finished:
>     > xstorage2_wan2_IP_start_0:3613:stderr [
>     > /usr/lib/ocf/resource.d/heartbeat/IPaddr[71]: local: not found [No
>     > such file or directory] ]
>     > Looks like the shell is not happy with the "local" variable
>     > definition.
>     > I tried running ocf-shellfuncs manually with sh and bash and they
>     > all run without errors.
>     > How can I see what shell is running these scripts?
>     >
>     ----------------------------------------------------------------------------------------
>     > *Sonicle S.r.l. *: http://www.sonicle.com
>     > *Music: *http://www.gabrielebulfon.com
>     > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>     >
>     ----------------------------------------------------------------------------------
>     > Da: Ken Gaillot
>     > A: users at clusterlabs.org
>     > Data: 25 agosto 2016 18.07.42 CEST
>     > Oggetto: Re: [ClusterLabs] ocf::heartbeat:IPaddr
>     > On 08/25/2016 10:51 AM, Gabriele Bulfon wrote:
>     > Hi,
>     > I'm advancing with this monster cluster on XStreamOS/illumos ;)
>     > In the previous older tests I used heartbeat, and I had these
>     > lines to
>     > take care of the swapping public IP addresses:
>     > primitive xstorage1_wan1_IP ocf:heartbeat:IPaddr params
>     > ip="1.2.3.4"
>     > cidr_netmask="255.255.255.0" nic="e1000g1"
>     > primitive xstorage2_wan2_IP ocf:heartbeat:IPaddr params
>     > ip="1.2.3.5"
>     > cidr_netmask="255.255.255.0" nic="e1000g1"
>     > location xstorage1_wan1_IP_pref xstorage1_wan1_IP 100: xstorage1
>     > location xstorage2_wan2_IP_pref xstorage2_wan2_IP 100: xstorage2
>     > They get configured, but then I get this in crm status:
>     > xstorage1_wan1_IP (ocf::heartbeat:IPaddr): Stopped
>     > xstorage2_wan2_IP (ocf::heartbeat:IPaddr): Stopped
>     > Failed Actions:
>     > * xstorage1_wan1_IP_start_0 on xstorage1 'not installed' (5):
>     > call=20,
>     > status=complete, exitreason='Setup problem: couldn't find command:
>     > /usr/bin/gawk',
>     > last-rc-change='Thu Aug 25 17:50:32 2016', queued=1ms, exec=158ms
>     > * xstorage2_wan2_IP_start_0 on xstorage1 'not installed' (5):
>     > call=22,
>     > status=complete, exitreason='Setup problem: couldn't find command:
>     > /usr/bin/gawk',
>     > last-rc-change='Thu Aug 25 17:50:33 2016', queued=1ms, exec=29ms
>     > * xstorage1_wan1_IP_start_0 on xstorage2 'not installed' (5):
>     > call=22,
>     > status=complete, exitreason='Setup problem: couldn't find command:
>     > /usr/bin/gawk',
>     > last-rc-change='Thu Aug 25 17:50:30 2016', queued=1ms, exec=36ms
>     > * xstorage2_wan2_IP_start_0 on xstorage2 'not installed' (5):
>     > call=20,
>     > status=complete, exitreason='Setup problem: couldn't find command:
>     > /usr/bin/gawk',
>     > last-rc-change='Thu Aug 25 17:50:29 2016', queued=0ms, exec=150ms
>     > The crm configure process already checked of the presence of the
>     > required IPaddr shell, and it was ok.
>     > Now looks like it's looking for "/usr/bin/gawk", and that is
>     > actually there!
>     > Is there any known incompatibility with the mixed heartbeat
>     > ocf ? Should
>     > I use corosync specific ocf files or something else?
>     > "heartbeat" in this case is just an OCF provider name, and has
>     > nothing
>     > to do with the heartbeat messaging layer, other than having its
>     > origin
>     > in the same project. There actually has been a recent proposal
>     > to rename
>     > the provider to "clusterlabs" to better reflect the current reality.
>     > The "couldn't find command" message comes from the ocf-binaries
>     > shell
>     > functions. If you look at have_binary() there, it uses sed and
>     > which,
>     > and I'm guessing that fails on your OS somehow. You may need to
>     > patch it.
>     > Thanks again!
>     > Gabriele
>     >
>     ----------------------------------------------------------------------------------------
>     > *Sonicle S.r.l. *: http://www.sonicle.com
>     > *Music: *http://www.gabrielebulfon.com
>     > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon




More information about the Users mailing list