[Pacemaker] Problem with configuring stonith rcd_serial

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Oct 28 10:58:28 EDT 2010


Hi,

On Thu, Oct 28, 2010 at 03:19:07PM +0200, Eberhard Kuemmerle wrote:
> On 27 Oct 2010 11:52, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Tue, Oct 26, 2010 at 09:33:17AM +0200, Eberhard Kuemmerle wrote:
> >> Hi,
> >>
> >> I try to configure stonith and get an error message that I don't understand:
> >>
> >> crm(live)# configure primitive stonith-P stonith::rcd_serial params
> >> hostlist="node1 node2" ttydev="/dev/ttyS0" msduration="2000"
> >> dtr|rts="rts"  op monitor interval="60s"
> >> element nvpair: Relax-NG validity error : Type ID doesn't allow value
> >> 'stonith-P-instance_attributes-dtr|rts'
> >> Relax-NG validity error : Element nvpair failed to validate attributes
> >> element nvpair: Relax-NG validity error : Invalid attribute id for
> >> element nvpair
> >> Relax-NG validity error : Extra element nvpair in interleave
> >> element nvpair: Relax-NG validity error : Element instance_attributes
> >> failed to validate content
> >> element cib: Relax-NG validity error : Element cib failed to validate
> >> content
> >> crm_verify[5810]: 2010/10/26_09:20:31 ERROR: main: CIB did not pass
> >> DTD/schema validation
> >> Errors found during check: config not valid
> >>
> >> If I remove the parameter dtr|rts="rts", the error is:
> >>
> >> crm(live)# configure primitive stonith-P stonith::rcd_serial params
> >> hostlist="node1 node2" ttydev="/dev/ttyS0" msduration="2000"
> >> ERROR: stonith-P: required parameter dtr|rts not defined
> >>
> >> so the parameter name dtr|rts seems to be ok.
> >
> > The shell builds ids (which you see up there) appending instance
> > attribute names. The name dtr|rts contains an invalid character
> > (|) for the XML ID attribute type. Need to fix that.
> >
> > In the meantime, you can either use cibadmin to define this
> > primitive, or edit the xml after defining the stonith resource:
> >
> > crm(live)configure# primitive stonith-P stonith::rcd_serial params ...
> > crm(live)configure# edit xml stonith-P
> >
> > Find the dtr|rts nvpair and replace "|" with "_" in the id
> > attribute. The shell may ask you if you wanted to edit again,
> > just answer no.
> >
> > Thanks,
> >
> > Dejan
> Hi Dejan,
> 
> thank you for your answer. Configuring with cibadmin worked, the xml is:
> 
>       <clone id="stonith">
>         <meta_attributes id="stonith-meta_attributes">
>           <nvpair id="stonith-meta_attributes-globally-unique"
> name="globally-unique" value="false"/>
>           <nvpair id="stonith-meta_attributes-clone-max"
> name="clone-max" value="2"/>
>           <nvpair id="stonith-meta_attributes-clone-node-max"
> name="clone-node-max" value="1"/>
>           <nvpair id="stonith-meta_attributes-target-role"
> name="target-role" value="Stopped"/>
>         </meta_attributes>
>         <primitive class="stonith" id="stonith-P" type="rcd_serial">
>           <instance_attributes id="stonith-P-instance_attributes">
>             <nvpair id="stonith-P-instance_attributes-hostlist"
> name="hostlist" value="node1 node2"/>
>             <nvpair id="stonith-P-instance_attributes-ttydev"
> name="ttydev" value="/dev/ttyS0"/>
>             <nvpair id="stonith-P-instance_attributes-dtr_rts"
> name="dtr|rts" value="rts"/>
>             <nvpair id="stonith-P-instance_attributes-msduration"
> name="msduration" value="2000"/>
>           </instance_attributes>
>           <operations>
>             <op id="stonith-P-monitor-60s" interval="60s" name="monitor"/>
>           </operations>
>         </primitive>
>       </clone>
> 
> But when I start the resource stonith, I get the following Errors in
> /var/log/message:
> 
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error:
> Entity: line 1: parsererror : Specification mandate value for attribute dtr
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error: se"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2" dtr
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML
> Error:
> ^
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error:
> Entity: line 1: parsererror : attributes construct error
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error: se"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2" dtr
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML
> Error:
> ^
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error:
> Entity: line 1: parsererror : Couldn't find end of Start Tag attributes
> line 1
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML Error: se"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2" dtr
> Oct 28 13:20:01 node1 crmd: [5229]: ERROR: crm_xml_err: XML
> Error:
> ^
> Oct 28 13:12:48 node1 crmd: [5229]: WARN: string2xml: Parsing failed
> (domain=1, level=3, code=73): Couldn't find end of Start Tag attributes
> line 1
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: string2xml: Couldn't fully
> parse 3961 chars: <crm_xml><transition_graph cluster-delay="60s"
> stonith-timeout="60s" failed-stop-offset="
> INFINITY" failed-start-offset="INFINITY" batch-limit="30"
> transition_id="39"><synapse id="0"><action_set><rsc_op id="136"
> operation="start" operation_key="stonith-P:0_start_0" on
> _node="node2" on_node_uuid="node2"><primitive id="stonith-P:0"
> long-id="stonith:stonith-P:0" class="stonith"
> type="rcd_serial"/><attributes CRM_meta_clone="0" CRM_meta_clone_ma
> x="2" CRM_meta_clone_node_max="1" CRM_meta_globally_unique="false"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2"
> dtr|rts="rts" hostlist="node1 node2"
>  msduration="2000"
> ttydev="/dev/ttyS0"/></rsc_op></action_set><inputs><trigger><pseudo_event id="140"
> operation="start" operation_key="stonith_start_0"/></trigger></inputs></syna
> pse><synapse id="1"><action_set><rsc_op id="137" operation="monitor"
> operation_key="stonith-P:0_monitor_60000" on_node="node2"
> on_node_uuid="node2"><primitive id="stonith-P:0"
> long-id="stonith:stonith-P:0" class="stonith"
> type="rcd_serial"/><attributes CRM_meta_clone="0" CRM_meta_clone_max="2"
> CRM_meta_clone_node_max="1" CRM_meta_globally_unique="false
> " CRM_meta_interval="60000" CRM_meta_name="monitor"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2"
> dtr|rts="rts" hostlist="node1 node2" msduration="20
> 00" ttydev="/dev/ttyS0"/></rsc_op></action_set><inputs><trigger><rsc_op
> id="136" operation="start" operation_key="stonith-P:0_start_0"
> on_node="node2" on_node_uuid="node2"/></t
> rigger></inputs></synapse><synapse id="2"><action_set><rsc_op id="138"
> operation="start" operation_key="stonith-P:1_start_0" on_node="node1"
> on_node_uuid="node1"><primitive id=
> "stonith-P:1" long-id="stonith:stonith-P:1" class="stonith"
> type="rcd_serial"/><attributes CRM_meta_clone="1" CRM_meta_clone_max="2"
> CRM_meta_clone_node_max="1" CRM_meta_globally
> _unique="false" CRM_meta_notify="false" CRM_meta_timeout="20000"
> crm_feature_set="3.0.2" dtr|
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial <crm_xml >
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial   <transition_graph cluster-delay="60s" stonith-timeout="60s"
> failed-stop-offset="INFINITY" failed-start-offset="INFINITY"
> batch-limit="30" transition_id="39" >
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial     <synapse id="0" >
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial       <action_set >
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial         <rsc_op id="136" operation="start"
> operation_key="stonith-P:0_start_0" on_node="node2" on_node_uuid="node2" >
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial           <primitive id="stonith-P:0"
> long-id="stonith:stonith-P:0" class="stonith" type="rcd_serial" />
> Oct 28 13:12:48 node1 lrmd: [5226]: ERROR: crm_abort: crm_strdup_fn:
> Triggered assert at utils.c:964 : src != NULL
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial           <attributes CRM_meta_clone="0" CRM_meta_clone_max="2"
> CRM_meta_clone_node_max="1" CRM_meta_globally_unique="false"
> CRM_meta_notify="false" CRM_meta_timeout="20000" crm_feature_set="3.0.2" />
> Oct 28 13:12:48 node1 lrmd: [5226]: ERROR: crm_strdup_fn: Could not
> perform copy at st_client.c:514 (stonith_api_device_metadata)
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial         </rsc_op>
> Oct 28 13:12:48 node1 lrmd: [5226]: WARN: stonith_api_device_metadata:
> no short description in rcd_serial's metadata.
> Oct 28 13:12:48 node1 crmd: [5229]: ERROR: log_data_element: string2xml:
> Partial       </action_set>
> .....

Ooops. So, the "|" is definitely out. Well, it's a silly name
too. Looks like we'll have to change it.

> I also tried to edit /usr/lib64/stonith/plugins/stonith2/rcd_serial.so,
> I replaced 'dtr|rts' by 'dtr_rts' there.
> Then, I could configure AND start the resource with the following config
> (with crm configure):
> 
> primitive stonith-P stonith:rcd_serial \
>         params hostlist="node1 node2" ttydev="/dev/ttyS0" dtr_rts="rts"
> msduration="2000" \
>         op monitor interval="60s"
> clone stonith stonith-P \
>         meta globally-unique="false" clone-max="2" clone-node-max="1"
> 
> In xml, the dtr/rts is now:
> 
>             <nvpair id="stonith-P-instance_attributes-dtr_rts"
> name="dtr_rts" value="rts"/>
> 
> 
> With that, I got less errors but still the following:
> 
> Oct 28 10:54:23 node1 pengine: [5228]: notice: LogActions: Start
> stonith-P:0#011(node2)
> Oct 28 10:54:23 node1 pengine: [5228]: notice: LogActions: Start
> stonith-P:1#011(node1)
> Oct 28 10:54:23 node1 lrmd: [5226]: notice: lrmd_rsc_new(): No
> lrm_rprovider field in message
> Oct 28 10:54:23 node1 lrmd: [5226]: info: rsc:stonith-P:1:89: probe
> Oct 28 10:54:23 node1 stonith-ng: [5224]: notice: stonith_device_action:
> Device stonith-P:1 not found
> Oct 28 10:54:23 node1 lrmd: [5226]: info: rsc:stonith-P:1:90: start
> Oct 28 10:54:23 node1 lrmd: [5226]: ERROR: crm_abort: crm_strdup_fn:
> Triggered assert at utils.c:964 : src != NULL
> Oct 28 10:54:23 node1 lrmd: [5226]: ERROR: crm_strdup_fn: Could not
> perform copy at st_client.c:514 (stonith_api_device_metadata)
> Oct 28 10:54:23 node1 lrmd: [5226]: WARN: stonith_api_device_metadata:
> no short description in rcd_serial's metadata.
> Oct 28 10:54:23 node1 lrmd: [5226]: info: stonithRA plugin: got
> metadata: <?xml version="1.0"?>#012<!DOCTYPE resource-agent SYSTEM
> "ra-api-1.dtd">#012<resource-agent name="rcd_serial">#012
> <version>1.0</version>#012  <longdesc lang="en">#012RC Delayed Serial
> STONITH Device#012This device can be constructed cheaply from readily
> available components,#012with sufficient expertise and testing.#012See
> README.rcd_serial for circuit diagram.#012#012  </longdesc>#012
> <shortdesc lang="en"><!-- no value
> --></shortdesc>#012<parameters><parameter name="hostlist" unique="1"
> required="1"><content type="string" />#012<shortdesc
> lang="en">#012Hostlist</shortdesc>#012<longdesc lang="en">#012The list
> of hosts that the STONITH device
> controls</longdesc>#012</parameter>#012<parameter name="ttydev"
> unique="1" required="1"><content type="string" />#012<shortdesc
> lang="en">#012TTY Device</shortdesc>#012<longdesc lang="en">#012The TTY
> device used for connecting to the STONITH
> device</longdesc>#012</parameter>#012<parameter name="dtr_rts"
> unique="1" required="1"><content type="string" />#012<shortdesc
> lang="en">#012dtr_rts</shortdesc>#012<longdesc lang="en">#012The
> hardware handshaking technique to use with ttydev("dtr" or
> "rts")</longdesc>#012</parameter>#012<parameter name="msduration"
> unique="1" required="1"><content type="string" />#012<shortdesc
> lang="en">#012msduration</shortdesc>#012<longdesc lang="en">#012The
> delay duration (in milliseconds) between the assertion of the control
> signal on ttydev and the closing of the reset
> switch</longdesc>#012</parameter>#012</parameters>#012  <actions>#012
> <action name="start"   timeout="15" />#012    <action name="stop"
> timeout="15" />#012    <action name="status"  timeout="15" />#012
> <action name="monitor" timeout="15" interval="15" start-delay="15"
> />#012    <action name="meta-data"  timeout="15" />#012  </actions>#012
> <special tag="heartbeat">#012    <version>2.0</version>#012
> </special>#012</resource-agent>
> Oct 28 10:54:23 node1 lrmd: [5226]: info: rsc:stonith-P:1:91: monitor
> Oct 28 10:54:23 node1 stonith: rcd_serial device OK.

These errors are unrelated, and should've been fixed by now. If
you still see them with the latest release, you can open a
bugzilla.

> Despite that, I tried a stonith reset with that config and the modified
> rcd_serial.so, but it failed:
> 
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (0 2): ** (process:24232): DEBUG: rcd_serial_set_config:called
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (3 19): (process:24232): DEBUG: rcd_serial_set_config:called
> Oct 28 10:58:56 node1 stonith-ng: [5224]: WARN: parse_host_line: Could
> not parse (0 0):
> Oct 28 10:58:56 node1 stonith-ng: [5224]: ERROR: log_operation:
> Operation 'reboot' [24240] for host 'node2' with device 'stonith-P:1'
> returned: 1 (call 0 from (null))

No other messages?

You can also try to test it on the command line:

# stonith -t rcd_serial hostlist=... ... -lS
# stonith -t rcd_serial hostlist=... ... -T reset node

Add -d to get debugging messages.

BTW, can't recall somebody using rcd_serial. Why don't you use a
real fencing device?

Thanks,

Dejan

> Oct 28 10:58:56 node1 pengine: [5228]: WARN: process_pe_message:
> Transition 14: WARNINGs found during PE processing. PEngine Input stored
> in: /var/lib/pengine/pe-warn-0.bz2
> 
> What can I do?
> 
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list