[ClusterLabs Developers] ProfitBricks STONITH fencing agent development

Dejan Muhamedagic dejanmm at fastmail.fm
Wed May 13 08:05:32 UTC 2015


Hi,

On Sun, May 10, 2015 at 12:19:49AM -0300, Tiago Santos wrote:
> Hello folks,
> 
> 
> I've been developing an initial version of a fencing agent to allow
> management of Profitbricks VMs (http://profitbricks.com/).
> 
> The initial code can be seen at
> https://github.com/tetreis/profitbricks_stonith_plugin/blob/master/profitbricks
> 
> 
> The fencing agent uses ProfitBricks SOAP API, that can be found at
> https://devops.profitbricks.com/api/soap/
> 
> It uses a config file that translates the provided node name to
> ProfitBricks server ID. The call is like:
> 
> /usr/lib/stonith/plugins/external/profitbricks action hostname
>        Action can be: status, on, off, reset
>        Hostname needs to be on config file (/etc/pb.conf)
> 
> It works fine on basic manual tests.
> 
> 
> It also replies correctly to:
> 
> # stonith -t external/profitbricks -h
> 
> 
> STONITH Device: external/profitbricks - ProfitBricks host
> reboot/poweron/poweroff/status
> 
> For more information see http://profitbricks.com/
> 
> List of valid parameter names for external/profitbricks STONITH device:
> hostname
> For Config info [-p] syntax, give each of the above parameters in order as
> the -p value.
> Arguments are separated by white space.
> Config file [-F] syntax is the same as -p, except # at the start of a line
> denotes a comment
> 
> 
> 
> But I'm having a hard time figuring out how to make it work with STONITH
> (or maybe understanding how STONITH works - sorry, I'm a total newbie on
> this) and how to configure it with Pacemaker/Corosync.
> 
> All this to request your help on the following two points:
> 
> 
> 1. When I run:
> 
> # stonith -t external/profitbricks -p status node1
> 
> I get a node reset.

:)

If you run stonith without arguments, it'll print basic usage.

You can get status like this:

# stonith -t external/profitbricks hostname=node1 -S

Node list:

# stonith -t external/profitbricks hostname=node1 -l

Reset:

# stonith -t external/profitbricks hostname=node1 -T reset

> And when I run:
> 
> # stonith -t external/profitbricks -p node1
> 
> I get the default stonith usage help, like if my syntax is wrong.

The stonith(8) interface takes some time getting used to.

> I've read and researched a lot, but couldn't figure out what I'm doing
> wrong here, although it seems to be pretty basic mistake.
> 
> 
> 
> 2. I have a setup with Pacemaker/Corosync configured. Virtual/floating IP
> (resource) works just fine and solid. Then I'm trying to add STONITH using
> the ProfitBricks plugin. I have a file with the following configuration for
> crm:
> 
> configure
> primitive st-node1 stonith:external/profitbricks \
> params hostname=node1
> primitive st-node2 stonith:external/profitbricks \
> params hostname=node2
> location l-st-node1 st-node1 -inf: node1
> location l-st-node2 st-node2 -inf: node2
> commit
> 
> I call it like:
> 
> crm < file.config
> 
> 
> Then I'll see on the fencing agent the usage error (called when $1 is not
> recognised):
> 
> Usage: /usr/lib/stonith/plugins/external/profitbricks action hostname
>        Action can be: status, on, off, reset
>        Hostname needs to be on config file (/etc/pb.conf)

The configuration looks fine, so there's probably something wrong
with the agent itself. I guess you'll need to debug.

> crm_mon will give me:
> 
> # crm_mon -1
> Last updated: Sat May  9 20:12:47 2015
> Last change: Sat May  9 20:12:40 2015 via cibadmin on node1
> Stack: corosync
> Current DC: node1 (1084751975) - partition with quorum
> Version: 1.1.10-42f2063
> 2 Nodes configured
> 3 Resources configured
> 
> 
> Online: [ node1 node2 ]
> 
>  VIP (ocf::heartbeat:IPaddr2): Started node1
> 
> Failed actions:
>     st-node2_start_0 (node=node1, call=64, rc=1, status=Error,
> last-rc-change=Sat May  9 20:12:41 2015
> , queued=3195ms, exec=0ms
> ): unknown error
>     st-node1_start_0 (node=node2, call=18, rc=1, status=Error,
> last-rc-change=Sat May  9 20:12:40 2015
> , queued=4202ms, exec=0ms
> ): unknown error
> 
> 
> And inside crm both services are stopped:
> 
> # crm
> crm(live)# resource list
>  VIP (ocf::heartbeat:IPaddr2): Started
>  st-node1 (stonith:external/profitbricks): Stopped
>  st-node2 (stonith:external/profitbricks): Stopped
> 
> 
> Would you guys help me figure out what's blocking me from going ahead on
> this story?
> 
> I know the fencing agent is still not obeying all the rules described at
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI, but that's not the
> reason why I'm getting the errors. I would like to understand what's this
> before going ahead also in a way to get better knowledge on the whole thing.

There's a slight confusion here. The agent you wrote conforms to
the Linux HA stonith API. RH fence-agents are somewhat different.

Thanks,

Dejan

> 
> Sorry for the long email, and thank you so much in advance for the help.
> 
> 
> Cheers,
> -- 
> *Tiago Santos*

> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers





More information about the Developers mailing list