[Pacemaker] fence_legacy, stonith and apcmastersnmp

Andrew Beekhof andrew at beekhof.net
Sun Mar 4 19:07:48 EST 2012


2012/3/2 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>:
> On Fri, 2 Mar 2012, Andrew Beekhof wrote:
>
>> 2012/3/2 Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu>:
>> >
>> > After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our
>> > working apcmastersnmp resources stopped to work:
>> >
>> > Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not
>> > accessible.
>> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation:
>> > Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2
>> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
>> > stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161
>> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation:
>> > stonith-atlas6: Invalid config info for apcmastersnmp device
>> >
>> > Please note the strange "161" argument of stonith.
>> >
>> > After checking the source code and stracing stonithd, as far as I see, the
>> > following happens:
>> >
>> > - stonithd calls fence_legacy, which steals the "port=161" parameter from
>> >  apcmastersnmp. This produces the error message
>> >  "Invalid config info for apcmastersnmp device"
>>
>> You keep saying steals, what do you mean by that?  Where is it stolen from?
>
> fence_legacy passes the parameters to the stonith drivers via environment
> variables, except the "port".

I had totally forgotten we do that.  Everything you've done makes
complete sense now.

The second part is already pushed as:
   https://github.com/beekhof/pacemaker/commit/797d740

I'll add the first part that adds the port as an environment variable now.

> However "port" is mandatory for
> apcmastersnmp. I should have worded it better.
>
>> What does your config look like?
>
> Before upgrade the working apcmastersnmp resource was
>
> primitive stonith-atlas5 stonith:apcmastersnmp \
>        params ipaddr="192.168.40.252" community="private" port="161" \
>        ...
>
> "ipaddr", "community" are passed via environments variables by
> fence_legacy, but "port" doesn't.
>
> We converted the resource to external/rackpdu, but that cannot handle
> nodes attached to multiple outlets, so we should have apcmastersnm working
> back.
>
>> > - At stealing "port=161", fence_legacy sets the port value to the node
>> >  name and passes to stonith, even in status mode. Therefore we
>> >  get "stonith -t apcmastersnmp -S 161"
>> > - However stonith cannot catch the invalid node parameter:
>> >
>> >        if (!(argcount == 1 || (argcount < 1
>> >        &&      (status||listhosts||listtypes||listparanames||metadata))))
>> > {
>> >                ++errors;
>> >        }
>>
>> where is fragment this from?
>
> The C code fragments are from cluster-glue-1.0.8/lib/stonith/main.c.
>
>> >   and even in status mode wants to run the reset request too:
>> >
>> >                if (status) {
>> >                        < no exit >
>> >                }
>> >                if (listhosts) {
>> >                        < no exit >
>> >                }
>> >                if (optind < argc) {
>> >                        ...
>> >                        rc = stonith_req_reset(s, reset_type, nodename);
>> >                }
>> >
>> > Fortunately the port value does not match nodename, so it won't kill any
>> > node, but the agent fails.
>> >
>> > Am I on the right track? Would the following patch fix the issue? I'm
>> > asking it, because I don't know why "port=" is handled separatedly and
>> > what are the implications of deleting $opt_n below.
>> >
>> > --- fence_legacy.orig   2012-02-29 23:03:36.594945717 +0100
>> > +++ fence_legacy        2012-03-01 14:41:46.454859212 +0100
>> > @@ -105,6 +105,7 @@
>> >        elsif ($name eq "port" )
>> >        {
>> >             $opt_n = $val;
>> > +            $ENV{$name} = $val;
>>
>> what is this for?
>
> Passing "port" similarly to the other parameters to the stonith drivers.
>
>> >         }
>> >        elsif ($name eq "stonith" )
>> >        {
>> > @@ -176,8 +177,8 @@
>> >    }
>> >    elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" )
>> >    {
>> > -       print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined $opt_q;
>> > -       exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to exec \"$opt_s\"\n";
>> > +       print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q;
>> > +       exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec \"$opt_s\"\n";
>>
>> I was under the impression that -S needed a node name, I see however
>> that this isnt the case.
>> Some devices can query the state of an individual port, it seems that
>> the stonith binary doesn't expose this.
>>
>> Does everything work when you have this patch?
>
> We'll give it a try today. It's the usual issue: we have to experiment
> on a in production cluster.
>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
>         H-1525 Budapest 114, POB. 49, Hungary
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list