[Pacemaker] Two stonith agents - stonith_multi and multipdu

Wed Nov 24 11:30:55 EST 2010

Hi,

On Tue, Nov 23, 2010 at 12:37:42PM +0200, Chris Picton wrote:
> Hi all
> 
> I had said last week I would post these, but it slipped my mind.
> 
> I have two stonith scripts here - they may not be complete, but they
> work in my circumstances.  Feel free to take these and
> update/modify/fix/improve.
> 
> 1. stonith_multi
> This is a wrapper around multiple stonith agents to allow the use of
> more than one agent.  My usecase was that my primary agent was not
> reliably connected to the servers for which it needed to act, and the
> monitor would often time out (once a week or so).  As I did not want to
> be constantly cleaning up the failed resource, I wrote this script to
> fall back to a backup agent.  If at least one of the agents reports
> success, the script will be successful.  In my case, this meant that the
> primary agent times out, but the secondary is successful, the monitor
> action does not fail.  Ditto for the stonith action as well.

Why didn't you just put a resource cleanup in a cron job? BTW,
we had once a discussion about allowing some resources monitor
ops to fail occasionally.

> A useful update to this would be a param like mode=all/any which would
> change the mode of operation of stonith_multi to require all agents to
> succeed (mode=all), or at least one (mode=any)

Right, mode=all is something which we currently don't have,
unless supported by a plugin.

> Configuration is a bit verbose, but is as follows (it supports up to 3
> agents, with 7 params each, but that is easily tweakable in the file)
> 
> primitive STONITH stonith:external/stonith_multi \
> 	params \
> 	agent1="external/teracopdu" \
> 	agent1_param1name="username" \
> 	agent1_param1val="chris at mydomain" \
> 	agent1_param2name="password" \
> 	agent1_param2val="mypassword" \
> 	agent1_param3name="proxy" \
> 	agent1_param3val="http://10.1.2.48:3128" \
> 	agent2="external/ecnipmi" \
> 	agent2_param1name="hostname1" \
> 	agent2_param1val="myhostname1.mydomain" \
> 	agent2_param2name="ipaddr1" \
> 	agent2_param2val="10.24.10.31" \
> 	agent2_param3name="hostname2" \
> 	agent2_param3val="myhostname2.mydomain" \
> 	agent2_param4name="ipaddr2" \
> 	agent2_param4val="10.24.10.32" \
> 	agent2_param5name="userid" \
> 	agent2_param5val="root" \
> 	agent2_param6name="passwd" \
> 	agent2_param6val="myipmipassword" \
> 	op monitor interval="3m" timeout="60s"

Quite a mouthful, but I guess that there's no other way around
it. Anyway, there's a LRM feature planned which will allow lrmd
to read resource configuration from files. It is primarily meant
for stonith so that passwords won't be visible in the CIB
anymore.

> The second one (multipdu) is an agent which can connect to multiple PDUs
> and control all outlets associated with the required action/host
> 
> It is actually in two parts - the first is the agent itself, which is a
> simple wrapper around a perl script which does the actual work.  While
> writing this I have noticed a bug/feature? which only allows the agent
> to control nodes specified in /etc/ha.d/ha.cf, but if there are no node
> specifications in /etc/ha.d/ha.cf, then it returns the full outlet list
> from the pdu

Hmm, ha.cf shouldn't influence the plugin. What happens if it's
running with corosync?

> As most pdus don't allow arbitrary length outlet names, the agent
> accepts a domain parameter, so that you can use short hostnames on the
> pdu, but map these to long hostnames in pacemaker:

Good idea.

> 
> primitive STONITH stonith:external/ecnpdu \
> 	params \
> 	pduip="10.12.3.190,10.12.3.191" \
> 	community="privatestring" \
> 	domain="ecntelecoms.za.net" \
> 	op monitor interval="3m" timeout="15s"
> 
> The control script (/sbin/pdu-control) can manage multiple pdu types,
> based on their sysObjectID

Didn't take a close look, but all scripts look quite good. If
you want to contribute them to the project, then you'll need to
add copyright notices and licenses. And then brush them up a
bit too.

What I've noticed so far is that logging is sort of haphazard,
probably adjusted to your envirment. It should use ha_log.sh,
like the rest of the plugins. And that grep ha.cf should go.

Thanks,

Dejan

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker