[Pacemaker] FS mount error

Thu Jul 22 04:23:56 EDT 2010

Please try:

# crm resource cleanup WebFS

This will fix if resource's fail-count reached INFINITY.

Rgds,
Michael

On 2010/7/22 下午 03:29, Proskurin Kirill wrote:
> Hello all.
> 
> I really new to Pacemaker and try to make some test and learn how it is
> all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to.
> 
> What we have:
> Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports)
> pacemaker 1.0.8+hg15494-4~bpo50+1
> openais 1.1.2-2~bpo50+1
> 
> 
> Problem:
> I try to add fs mount resource but get unknown error. If I mount it by
> hands - all is ok.
> 
> crm_mon:
> 
> ============
> Last updated: Thu Jul 22 08:22:20 2010
> Stack: openais
> Current DC: node01.domain.org - partition with quorum
> Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ node02.domain.org node01.domain.org ]
> 
> ClusterIP       (ocf::heartbeat:IPaddr2):       Started node02.domain.org
>  Master/Slave Set: WebData
>      Masters: [ node02.domain.org ]
>      Slaves: [ node01.domain.org ]
> WebFS   (ocf::heartbeat:Filesystem):    Started node02.domain.org FAILED
> 
> Failed actions:
>     WebFS_start_0 (node=node01.domain.org, call=18, rc=1,
> status=complete): unknown error
>     WebFS_start_0 (node=node02.domain.org, call=301, rc=1,
> status=complete): unknown error
> 
> node01:~# crm_verify -VL
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node01.domain.org: unknown error (1)
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node02.domain.org: unknown error (1)
> crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness:
> Forcing WebFS away from node01.domain.org after 1000000 failures
> (max=1000000)
> 
> 
> node01:~# crm configure show
> node node01.domain.org
> node node02.domain.org
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>     params ip="192.168.1.100" cidr_netmask="32" \
>     op monitor interval="30s"
> primitive WebFS ocf:heartbeat:Filesystem \
>     params device="/dev/drbd0" directory="/var/spool/dovecot"
> fstype="ext4" \
>     op start interval="0" timeout="60s" \
>     op stop interval="0" timeout="60s" \
>     meta target-role="Started"
> primitive WebSite ocf:heartbeat:apache \
>     params configfile="/etc/apache2/apache2.conf" \
>     op monitor interval="1min" \
>     op start interval="0" timeout="40s" \
>     op stop interval="0" timeout="60s" \
>     meta target-role="Started"
> primitive wwwdrbd ocf:linbit:drbd \
>     params drbd_resource="drbd0" \
>     op monitor interval="60s" \
>     op start interval="0" timeout="240s" \
>     op stop interval="0" timeout="100s"
> ms WebData wwwdrbd \
>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> colocation WebSite-with-WebFS inf: WebSite WebFS
> colocation fs_on_drbd inf: WebFS WebData:Master
> colocation website-with-ip inf: WebSite ClusterIP
> order WebFS-after-WebData inf: WebData:promote WebFS:start
> order WebSite-after-WebFS inf: WebFS WebSite
> order apache-after-ip inf: ClusterIP WebSite
> property $id="cib-bootstrap-options" \
>     dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     stonith-enabled="false" \
>     last-lrm-refresh="1279717510"
> 
> 
> In logs:
> Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o
> resources
> Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R
> -o resources
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
> <cib admin_epoch="0" epoch="143" num_updates="2" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>   <configuration >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>     <resources >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>       <primitive id="WebFS" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>         <meta_attributes id="WebFS-meta_attributes" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>           <nvpair value="Stopped" id="WebFS-meta_attributes-target-role" />
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>         </meta_attributes>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>       </primitive>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>     </resources>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
>   </configuration>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: -
> </cib>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
> <cib admin_epoch="0" epoch="144" num_updates="1" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>   <configuration >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>     <resources >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>       <primitive id="WebFS" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>         <meta_attributes id="WebFS-meta_attributes" >
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>           <nvpair value="Started" id="WebFS-meta_attributes-target-role" />
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>         </meta_attributes>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>       </primitive>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>     </resources>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
>   </configuration>
> Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: +
> </cib>
> Jul 22 08:18:42 node01 cib: [1810]: info: cib_process_request: Operation
> complete: op cib_replace for section resources (origin=local/cibadmin/2,
> version=0.144.1): ok (rc=0)
> Jul 22 08:18:42 node01 cib: [1201]: info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-89.raw
> Jul 22 08:18:42 node01 cib: [1201]: info: write_cib_contents: Wrote
> version 0.144.0 of the CIB to disk (digest:
> 5f51a15c21330c7ff76862ad9a5193b1)
> Jul 22 08:18:42 node01 cib: [1201]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.woPqNQ (digest:
> /var/lib/heartbeat/crm/cib.bF43Zi)
> Jul 22 08:18:42 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:42 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:42 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:42 node01 crmd: [1814]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> Jul 22 08:18:42 node01 crmd: [1814]: info: need_abort: Aborting on
> change to admin_epoch
> Jul 22 08:18:42 node01 crmd: [1814]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jul 22 08:18:42 node01 crmd: [1814]: info: do_state_transition: All 2
> cluster nodes are eligible to run resources.
> Jul 22 08:18:42 node01 crmd: [1814]: info: do_pe_invoke: Query 350:
> Requesting the current CIB: S_POLICY_ENGINE
> Jul 22 08:18:42 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:43 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:43 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:43 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:43 node01 crmd: [1814]: info: do_pe_invoke_callback:
> Invoking the PE: query=350, ref=pe_calc-dc-1279783123-729, seq=152,
> quorate=1
> Jul 22 08:18:43 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:43 node01 pengine: [1813]: info: unpack_config: Node
> scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Jul 22 08:18:43 node01 pengine: [1813]: info: determine_online_status:
> Node node01.domain.org is online
> Jul 22 08:18:43 node01 pengine: [1813]: notice: unpack_rsc_op: Operation
> WebSite_monitor_0 found resource WebSite active on node01.domain.org
> Jul 22 08:18:43 node01 pengine: [1813]: WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node01.domain.org: unknown error (1)
> Jul 22 08:18:43 node01 pengine: [1813]: info: determine_online_status:
> Node node02.domain.org is online
> Jul 22 08:18:43 node01 pengine: [1813]: notice: unpack_rsc_op: Operation
> WebSite_monitor_0 found resource WebSite active on node02.domain.org
> Jul 22 08:18:43 node01 pengine: [1813]: WARN: unpack_rsc_op: Processing
> failed op WebFS_start_0 on node02.domain.org: unknown error (1)
> Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print:
> ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started node02.domain.org
> Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print:
> WebSite#011(ocf::heartbeat:apache):#011Stopped
> Jul 22 08:18:43 node01 pengine: [1813]: notice: clone_print:
> Master/Slave Set: WebData
> Jul 22 08:18:43 node01 pengine: [1813]: notice: short_print: Masters: [
> node02.domain.org ]
> Jul 22 08:18:43 node01 pengine: [1813]: notice: short_print: Slaves: [
> node01.domain.org ]
> Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print:
> WebFS#011(ocf::heartbeat:Filesystem):#011Stopped
> Jul 22 08:18:43 node01 pengine: [1813]: info: get_failcount: WebFS has
> failed 1000000 times on node01.domain.org
> Jul 22 08:18:43 node01 pengine: [1813]: WARN: common_apply_stickiness:
> Forcing WebFS away from node01.domain.org after 1000000 failures
> (max=1000000)
> Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights:
> WebData: Rolling back scores from WebFS
> Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights:
> wwwdrbd:0: Rolling back scores from WebFS
> Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights:
> WebData: Rolling back scores from WebFS
> Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: Promoting
> wwwdrbd:0 (Master node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: WebData:
> Promoted 1 instances of a possible 1 to master
> Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: Promoting
> wwwdrbd:0 (Master node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: WebData:
> Promoted 1 instances of a possible 1 to master
> Jul 22 08:18:43 node01 pengine: [1813]: notice: RecurringOp:  Start
> recurring monitor (60s) for WebSite on node02.domain.org
> Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave
> resource ClusterIP#011(Started node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Start
> WebSite#011(node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave
> resource wwwdrbd:0#011(Master node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave
> resource wwwdrbd:1#011(Slave node01.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Start
> WebFS#011(node02.domain.org)
> Jul 22 08:18:43 node01 pengine: [1813]: info: process_pe_message:
> Transition 199: PEngine Input stored in: /var/lib/pengine/pe-input-243.bz2
> Jul 22 08:18:44 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:44 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:44 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:44 node01 crmd: [1814]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jul 22 08:18:44 node01 crmd: [1814]: info: unpack_graph: Unpacked
> transition 199: 4 actions in 4 synapses
> Jul 22 08:18:44 node01 crmd: [1814]: info: do_te_invoke: Processing
> graph 199 (ref=pe_calc-dc-1279783123-729) derived from
> /var/lib/pengine/pe-input-243.bz2
> Jul 22 08:18:44 node01 crmd: [1814]: info: te_rsc_command: Initiating
> action 42: start WebFS_start_0 on node02.domain.org
> Jul 22 08:18:44 node01 crmd: [1814]: info: te_rsc_command: Initiating
> action 5: probe_complete probe_complete on node02.domain.org - no waiting
> Jul 22 08:18:44 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:45 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:45 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:45 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:45 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:46 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:46 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:46 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:46 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jul 22 08:18:47 node01 crmd: [1814]: ERROR: stonithd_signon: Can't
> initiate connection to stonithd
> Jul 22 08:18:47 node01 crmd: [1814]: notice: Not currently connected.
> Jul 22 08:18:47 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in
> failed: triggered a retry
> Jul 22 08:18:47 node01 crmd: [1814]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
>