[Pacemaker] Howto write a STONITH agent

Christoph Herrmann C.Herrmann at science-computing.de
Fri Jan 14 16:10:17 UTC 2011


-----Ursprüngliche Nachricht-----
Von: Dejan Muhamedagic <dejanmm at fastmail.fm>
Gesendet: Fr 14.01.2011 12:31
An: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>; 
Betreff: Re: [Pacemaker] Howto write a STONITH agent

> Hi,
> 
> On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote:
> > Hi,
> > 
> > I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 
> 1.81 ...)
> > But I'm not able to connect with them via the external/riloe agent.
> > When i try:
> > 
> > stonith -t external/riloe -p "hostlist=node1 ilo_hostname=ilo1  
> ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
> ilo_powerdown_method=power" -S
> 
> Try this:
> 
> stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1  ilo_user=ilouser 
> ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
> ilo_powerdown_method=power -S

thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. I've tested it about 10 times
and 5 times it hangs.  That's not what I want.
Finally I will use my own ssh-ilo agent. It's very simple (KISS) and reliable. The external/riloe agent did not
look to simple.

So my questions still remain. Is there a HOWTO for writing stonith agents.
Is it usefull to write (to run) a stonith agent as cloned resource?
What should the status check do with a cloned stonith resource. Is it usefull in any way? (As long as I have 4 different nodes with 4 different ilo boards.)

 
Cheers,


  Christoph &:-)


> Thanks,
> 
> Dejan
> 
> > 
> > I get the following answer:
> > 
> > external/riloe[14317]: ERROR: unknown power method %s, setting to "power"
> > external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while 
> talking to ilo_hostname=ilo1
> > 
> > ** (process:14315): CRITICAL **: external_run_cmd: Calling 
> '/usr/lib64/stonith/plugins/external/riloe status' returned 1
> > 
> > ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with 
> rc 1
> > stonith: external/riloe device not accessible.
> > 
> > 
> > But I can access ilo1 with http, https and ssh. The easiest way to reset a 
> node is to run:
> > 
> > ssh -i ilo-sshkey ilouser at ilo1 reset system1 
> > 
> > I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than 
> debugging the existing one. But I'm looking for a short howto. I've read some 
> STONITH agents, but they are not completely self-explaining and I have some 
> questions. Is there a short howto write a stonith agent manual which google and 
> I were not able to find?
> > Or should I post all questions to the list?
> > here we go:
> > 
> > 1. (and most important): What does the status check do, if you have an agent 
> which runs as cloned resource (my ssh-ilo agent should run as a cloned 
> resource). Does it check all nodes? Is it possible to check the status of a 
> single node?
> > 2. What are the expected return codes?
> > 
> > more to follow ;-)
> > 
> > 
> > 
> > 
> > regards
> > 
> > 
> >    Christoph &:-)
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Michel Lepert
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 






More information about the Pacemaker mailing list