[Pacemaker] Problem with configuring stonith rcd_serial

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Oct 29 08:43:35 EDT 2010


On Fri, Oct 29, 2010 at 02:04:38PM +0200, Eberhard Kuemmerle wrote:
> On
> 
> 28 Oct 2010 16:58, Dejan Muhamedagic wrote:
> 
> >
> >> Despite that, I tried a stonith reset with that config and the modified
> >> rcd_serial.so, but it failed:
> >>
> >> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> >> not parse (0 2): ** (process:24232): DEBUG: rcd_serial_set_config:called
> >> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> >> not parse (3 19): (process:24232): DEBUG: rcd_serial_set_config:called
> >> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> >> not parse (0 0):
> >> Oct 28 10:58:56 node1 stonith-ng: [5224]: ERROR: log_operation:
> >> Operation 'reboot' [24240] for host 'node2' with device 'stonith-P:1'
> >> returned: 1 (call 0 from (null))
> >>
> > No other messages?
> >
> All messages with stonith-ng are:
> 
> Oct 26 11:07:10 node1 stonith-ng: [5348]: info: Invoked:
> /usr/lib64/heartbeat/stonithd
> Oct 26 11:07:10 node1 stonith-ng: [5348]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Oct 28 08:55:38 node1 stonith-ng: [5224]: info: Invoked:
> /usr/lib64/heartbeat/stonithd
> Oct 28 08:55:38 node1 stonith-ng: [5224]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Oct 28 10:54:23 node1 stonith-ng: [5224]: notice: stonith_device_action:
> Device stonith-P:1 not found
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (0 2): ** (process:24232): DEBUG: rcd_serial_set_config:called
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (3 19): (process:24232): DEBUG: rcd_serial_set_config:called
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (0 0):
> Oct 28 10:58:56 node1 stonith-ng: [5224]: ERROR: log_operation:
> Operation 'reboot' [24240] for host 'node2' with device 'stonith-P:1'
> returned: 1 (call 0 from (null))
> Oct 28 10:59:56 node1 stonith-ng: [5224]: ERROR: remote_op_timeout:
> Action reboot (5255ad54-96b4-42d9-9596-71e68be339b3) for node2 timed out
> 
> > You can also try to test it on the command line:
> >
> > # stonith -t rcd_serial hostlist=... ... -lS
> > # stonith -t rcd_serial hostlist=... ... -T reset node
> >
> stonith -t rcd_serial -p "test /dev/ttyS0 rts 2000" test
> ** (process:21181): DEBUG: rcd_serial_set_config:called
> Alarm clock
> ==> RESET WORKS!
> 
> stonith -t rcd_serial hostlist="node1 node2" ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -S
> ** (process:28054): DEBUG: rcd_serial_set_config:called
> stonith: rcd_serial device OK.
> 
> stonith -t rcd_serial hostlist="node1 node2" ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -l
> ** (process:27543): DEBUG: rcd_serial_set_config:called
> node1 node2
> 
> stonith -t rcd_serial hostlist='node1 node2' ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -T reset node2
> ** (process:29624): DEBUG: rcd_serial_set_config:called
> ** (process:29624): CRITICAL **: rcd_serial_reset_req: host 'node2' not
> in hostlist.

And this message never appears in the logs?

> ==> RESET FAILED
> 
> stonith -t rcd_serial hostlist='node1, node2' ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -T reset node2
> ** (process:26929): DEBUG: rcd_serial_set_config:called
> ** (process:26929): CRITICAL **: rcd_serial_reset_req: host 'node2' not
> in hostlist.
> ==> RESET FAILED (notice: hostlist is comma separated here)
> 
> stonith -t rcd_serial hostlist="node1 node2" ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -T reset "node1 node2"
> ==> RESET WORKS, BUT the argument <<reset "node1 node2">> is shit...
> ==> There seems to be a problem with parsing the host list!

It turns out that the hostlist can contain just one node. That
makes sense since you can reach only one host over the serial
cable. The plugin also makes no effort to tell the user if the
hostlist looks meaningful, i.e. it considers "node1 node2" as a
node name (as you've shown above).

So, you'll need to configure two stonith resources, one per node.

> > Add -d to get debugging messages.
> >
> stonith -t rcd_serial hostlist='node1 node2' ttydev="/dev/ttyS0"
> dtr\|rts="rts" msduration="2000" -T reset node2 -d
> ** (process:23673): DEBUG: NewPILPluginUniv(0x605060)
> ** (process:23673): DEBUG: PILS: Plugin path =
> /usr/lib64/stonith/plugins:/usr/lib64/heartbeat/plugins
> ** (process:23673): DEBUG: NewPILInterfaceUniv(0x6051a0)
> ** (process:23673): DEBUG: NewPILPlugintype(0x605550)
> ** (process:23673): DEBUG: NewPILPlugin(0x605cc0)
> ** (process:23673): DEBUG: NewPILInterface(0x605d10)
> ** (process:23673): DEBUG:
> NewPILInterface(0x605d10:InterfaceMgr/InterfaceMgr)*** user_data: 0x0
> *******
> ** (process:23673): DEBUG:
> InterfaceManager_plugin_init(0x605d10/InterfaceMgr)
> ** (process:23673): DEBUG: Registering Implementation manager for
> Interface type 'InterfaceMgr'
> ** (process:23673): DEBUG: PILS: Looking for InterfaceMgr/generic =>
> [/usr/lib64/stonith/plugins/InterfaceMgr/generic.so]
> ** (process:23673): DEBUG: Plugin file
> /usr/lib64/stonith/plugins/InterfaceMgr/generic.so does not exist
> ** (process:23673): DEBUG: PILS: Looking for InterfaceMgr/generic =>
> [/usr/lib64/heartbeat/plugins/InterfaceMgr/generic.so]
> ** (process:23673): DEBUG: Plugin path for InterfaceMgr/generic =>
> [/usr/lib64/heartbeat/plugins/InterfaceMgr/generic.so]
> ** (process:23673): DEBUG: PluginType InterfaceMgr already present
> ** (process:23673): DEBUG: Plugin InterfaceMgr/generic  init function:
> InterfaceMgr_LTX_generic_pil_plugin_init
> ** (process:23673): DEBUG: NewPILPlugin(0x605dc0)
> ** (process:23673): DEBUG: Plugin InterfaceMgr/generic loaded and
> constructed.
> ** (process:23673): DEBUG: Calling init function in plugin
> InterfaceMgr/generic.
> ** (process:23673): DEBUG: NewPILInterface(0x606810)
> ** (process:23673): DEBUG:
> NewPILInterface(0x606810:InterfaceMgr/stonith2)*** user_data: 0x605e10
> *******
> ** (process:23673): DEBUG: Registering Implementation manager for
> Interface type 'stonith2'
> ** (process:23673): DEBUG: IfIncrRefCount(1 + 1 )
> ** (process:23673): DEBUG: PluginIncrRefCount(0 + 1 )
> ** (process:23673): DEBUG: IfIncrRefCount(1 + 100 )
> ** (process:23673): DEBUG: PILS: Looking for stonith2/rcd_serial =>
> [/usr/lib64/stonith/plugins/stonith2/rcd_serial.so]
> ** (process:23673): DEBUG: Plugin path for stonith2/rcd_serial =>
> [/usr/lib64/stonith/plugins/stonith2/rcd_serial.so]
> ** (process:23673): DEBUG: Creating PluginType for stonith2
> ** (process:23673): DEBUG: NewPILPlugintype(0x605120)
> ** (process:23673): DEBUG: Plugin stonith2/rcd_serial  init function:
> stonith2_LTX_rcd_serial_pil_plugin_init
> ** (process:23673): DEBUG: NewPILPlugin(0x606920)
> ** (process:23673): DEBUG: Plugin stonith2/rcd_serial loaded and
> constructed.
> ** (process:23673): DEBUG: Calling init function in plugin
> stonith2/rcd_serial.
> ** (process:23673): DEBUG: NewPILInterface(0x606bb0)
> ** (process:23673): DEBUG:
> NewPILInterface(0x606bb0:stonith2/rcd_serial)*** user_data:
> 0x7f8d036071c0 *******
> ** (process:23673): DEBUG: IfIncrRefCount(101 + 1 )
> ** (process:23673): DEBUG: PluginIncrRefCount(0 + 1 )
> ** (process:23673): DEBUG: rcd_serial_set_config:called
> 
> ** (process:23673): CRITICAL **: rcd_serial_reset_req: host 'node2' not
> in hostlist.
> ** (process:23673): DEBUG: IfIncrRefCount(1 + -1 )
> ** (process:23673): DEBUG: RemoveAPILInterface(0x606bb0/rcd_serial)
> ** (process:23673): DEBUG: RmAPILInterface(0x606bb0/rcd_serial)
> ** (process:23673): DEBUG: PILunregister_interface(stonith2/rcd_serial)
> ** (process:23673): DEBUG: Calling InterfaceClose on stonith2/rcd_serial
> ** (process:23673): DEBUG: IfIncrRefCount(102 + -1 )
> ** (process:23673): DEBUG: PluginIncrRefCount(1 + -1 )
> ** (process:23673): DEBUG: RemoveAPILPlugin(stonith2/rcd_serial)
> ** (process:23673): DEBUG: RmAPILPlugin(stonith2/rcd_serial)
> ** (process:23673): DEBUG: Closing dlhandle for (stonith2/rcd_serial)
> ** (process:23673): DEBUG: RmAPILPluginType(stonith2)
> ** (process:23673): DEBUG: DelPILPluginType(stonith2)
> ** (process:23673): DEBUG: DelPILInterface(0x606bb0/rcd_serial)
> 
> > BTW, can't recall somebody using rcd_serial. Why don't you use a
> > real fencing device?
> >
> Oh, rcd_serial is simple an cheap (our electronics workshop manufactured
> the hardware)
> and I think it would exactly provide what we need, if it would work!

Fair enough. Indeed this device could be considered as one of the
most robust.

Thanks,

Dejan

> Best regards,
>   Eberhard
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list