[Pacemaker] Shooting and diagnosis of stonith plugins

Takenaka Kazuhiro takenaka.kazuhiro at oss.ntt.co.jp
Fri Oct 10 02:30:27 EDT 2008


Hi all.

So far as I know, every stonith plugin is expected to diagnose if
its target is fenced out from the other nodes before it returns
successful status on 'reset' or 'off'.

However, I think this diagnosis is somewhat excess burden for an
indivdual plugin.

Because authors of plugins know how to deal with stonith devices
for which they make plugins, but they can't always expect structure
of clusters on which their plugins will work.

When a clusters administrator try to use some plugin but the diagnosis
of the plugin doesn't match the cluster, the administrator can't help
but directly alter the plugin.

This gets down plugins' adaptiveness and can't be favorable.
One idea to avoid this problem is making schemes or conventions
which enable plugins to delegate the diagnosis to other plugins.

Attached two plugins are a sample of this idea. They work cooperatively
by the attached cib.xml.

'sshAltered' only shoots its targets and 'pingAllAddr' only diagnoses
activity of its targets.

The followings are little more detailed explanations:

  When some accidents made necessary to shoot a corrupted node
  by another node, the shooter node uses 'sshAltered' firstly to
  shoot the target node.

  'sshAltered' shoots its targets but never exits with a successful
  status if the value of attribute 'shoot_only' is "yes" in the same
  way as the attached cib.xml. So, next plugin will be used always
  if it is defined.

  'pingAllAddr' confirms activity of the IP addresses of its targets
  specified in cib.xml. If any of the IP addresses don't respond,
  'pingAllAddr' exits with a successful status, otherwise it
  exits with an error status.

After once 'external/ssh' is rewritten into 'sshAltered', there
is no need to rewrite it again to use other conditions to
confirm targets' death.

For example, if a cluster uses iSCSI shared storages and
a failover action on this cluster must wait for the iSCSI target
devices to sweep connections to the corrupted node, it can do by
the other type plugins instead of 'pingAllAddr'. Their task is to
ask iSCSI target devices about completion of connection sweeping.

Vice-versa is also true. Any plugin which follows the explained
convention can work together with 'pingAllAddr'.

It can also be avalable by another tag-attibute like this:

  <primitive type="external/ssh class="stonith" task="shoot" ...>

I hope some kind of agreement will be made about this problem.

Best regard.
-- 
Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sshAltered
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081010/3ad09bc3/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pingAllAddr
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081010/3ad09bc3/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 3493 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081010/3ad09bc3/attachment.xml>


More information about the Pacemaker mailing list