[ClusterLabs] Fwd: Resource failure

Vijay Partha vijaysarathy94 at gmail.com
Mon Jul 27 02:17:10 EDT 2015


This is my messages log.

Jul 27 08:02:46 vmx-occ-005 apache(WebSite)[32477]: INFO: apache not running
Jul 27 08:02:46 vmx-occ-005 crmd[31424]:   notice: process_lrm_event:
Operation WebSite_monitor_60000: not running (node=node1, call=11, rc=7,
cib-update=15, confirmed=false)
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_cs_dispatch:
Update relayed from node2
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-WebSite (1)
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_perform_update:
Sent update 12: fail-count-WebSite=1
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_cs_dispatch:
Update relayed from node2
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-WebSite (1437976962)
Jul 27 08:02:46 vmx-occ-005 attrd[31422]:   notice: attrd_perform_update:
Sent update 14: last-failure-WebSite=1437976962
Jul 27 08:02:46 vmx-occ-005 apache(WebSite)[32511]: INFO: apache is not
running.
Jul 27 08:02:46 vmx-occ-005 crmd[31424]:   notice: process_lrm_event:
Operation WebSite_stop_0: ok (node=node1, call=14, rc=0, cib-update=16,
confirmed=true)

this is my corosync log:


Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/crmd/15)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: --- 0.38.65 2
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: +++ 0.38.66 (null)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=66
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +
/cib/status/node_state[@id='node1']/lrm[@id='node1']/lrm_resources/lrm_resource[@id='WebSite']/lrm_rsc_op[@id='WebSite_last_failure_0']:
@operation_key=WebSite_monitor_60000,
@transition-key=9:119038:0:a5b747ee-4fbc-4f65-a690-29276791fd19,
@transition-magic=0:7;9:119038:0:a5b747ee-4fbc-4f65-a690-29276791fd19,
@call-id=11, @rc-code=7, @interval=60000, @last-rc-change=1437976966,
@exec-time=0, @op-digest=eddc33bef3f1592ad847638ee4
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=node1/crmd/15, version=0.38.66)
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_cs_dispatch:    Update relayed from node2
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_trigger_update:         Sending flush op to all hosts for:
fail-count-WebSite (1)
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_perform_update:         Sent update 12: fail-count-WebSite=1
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/attrd/12)
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_cs_dispatch:    Update relayed from node2
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_trigger_update:         Sending flush op to all hosts for:
last-failure-WebSite (1437976962)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: --- 0.38.66 2
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: +++ 0.38.67 (null)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=67
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       ++
/cib/status/node_state[@id='node1']/transient_attributes[@id='node1']/instance_attributes[@id='status-node1']:
<nvpair id="status-node1-fail-count-WebSite" name="fail-count-WebSite"
value="1"/>
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=node1/attrd/12, version=0.38.67)
Jul 27 08:02:46 [31422] vmx-occ-005      attrd:   notice:
attrd_perform_update:         Sent update 14:
last-failure-WebSite=1437976962
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/attrd/14)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: --- 0.38.67 2
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: +++ 0.38.68 (null)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=68
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       ++
/cib/status/node_state[@id='node1']/transient_attributes[@id='node1']/instance_attributes[@id='status-node1']:
<nvpair id="status-node1-last-failure-WebSite" name="last-failure-WebSite"
value="1437976962"/>
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=node1/attrd/14, version=0.38.68)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=node2/attrd/404, version=0.38.68)
Jul 27 08:02:46 [31421] vmx-occ-005       lrmd:     info:
cancel_recurring_action:      Cancelling operation WebSite_monitor_60000
Jul 27 08:02:46 [31424] vmx-occ-005       crmd:     info:
do_lrm_rsc_op:        Performing
key=3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19 op=WebSite_stop_0
Jul 27 08:02:46 [31421] vmx-occ-005       lrmd:     info: log_execute:
executing - rsc:WebSite action:stop call_id:14
Jul 27 08:02:46 [31424] vmx-occ-005       crmd:     info:
process_lrm_event:    Operation WebSite_monitor_60000: Cancelled
(node=node1, call=11, confirmed=true)
apache(WebSite)[32511]: 2015/07/27_08:02:46 INFO: apache is not running.
Jul 27 08:02:46 [31421] vmx-occ-005       lrmd:     info:
log_finished:         finished - rsc:WebSite action:stop call_id:14
pid:32511 exit-code:0 exec-time:167ms queue-time:0ms
Jul 27 08:02:46 [31424] vmx-occ-005       crmd:   notice:
process_lrm_event:    Operation WebSite_stop_0: ok (node=node1, call=14,
rc=0, cib-update=16, confirmed=true)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/crmd/16)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: --- 0.38.68 2
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       Diff: +++ 0.38.69 (null)
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=69
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_perform_op:       +
/cib/status/node_state[@id='node1']/lrm[@id='node1']/lrm_resources/lrm_resource[@id='WebSite']/lrm_rsc_op[@id='WebSite_last_0']:
@operation_key=WebSite_stop_0, @operation=stop,
@transition-key=3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19,
@transition-magic=0:0;3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19,
@call-id=14, @last-run=1437976966, @last-rc-change=1437976966,
@exec-time=167
Jul 27 08:02:46 [31419] vmx-occ-005        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=node1/crmd/16, version=0.38.69)
Jul 27 08:02:51 [31419] vmx-occ-005        cib:     info:
cib_process_ping:     Reporting our current digest to node2:
608e7e54d63c1f66c39c9b4162a189d3 for 0.38.69 (0x846320 0)

These are the logs after i have triggered the failure. Pacemaker doesnt
restarts the service automatically, even if i start the httpd service , the
status i get is stopped on node 1. If i restart the cluster it works fine.




On Mon, Jul 27, 2015 at 11:30 AM, Vijay Partha <vijaysarathy94 at gmail.com>
wrote:

> Could you help me out in configuring stonith properly. I am new to
> pacemaker and I have been working for a few days. What all logs do you
> require?
>
> On Mon, Jul 27, 2015 at 11:22 AM, Digimer <lists at alteeve.ca> wrote:
>
>> On 27/07/15 01:35 AM, Vijay Partha wrote:
>> > HI .
>> >
>> > My configuration file looks like this:
>> >
>> > <cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="38"
>> > num_updates="0" admin_epoch="0" cib-last-written="Fri Jul 24 15:57:06
>> > 2015" have-quorum="1" dc-uuid="node2">
>> >   <configuration>
>> >     <crm_config>
>> >       <cluster_property_set id="cib-bootstrap-options">
>> >         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>> > value="1.1.11-97629de"/>
>> >         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> > name="cluster-infrastructure" value="cman"/>
>> >         <nvpair id="cib-bootstrap-options-stonith-enabled"
>> > name="stonith-enabled" value="false"/>
>> >         <nvpair id="cib-bootstrap-options-no-quorum-policy"
>> > name="no-quorum-policy" value="ignore"/>
>> >         <nvpair id="cib-bootstrap-options-cluster-recheck-interval"
>> > name="cluster-recheck-interval" value="2s"/>
>> >       </cluster_property_set>
>> >     </crm_config>
>> >     <nodes>
>> >       <node id="node1" uname="node1"/>
>> >       <node id="node2" uname="node2"/>
>> >     </nodes>
>> >     <resources>
>> >       <primitive class="ocf" id="my_first_svc" provider="heartbeat"
>> > type="Dummy">
>> >         <instance_attributes id="my_first_svc-instance_attributes"/>
>> >         <operations>
>> >           <op id="my_first_svc-start-timeout-20" interval="0s"
>> > name="start" timeout="20"/>
>> >           <op id="my_first_svc-stop-timeout-20" interval="0s"
>> > name="stop" timeout="20"/>
>> >           <op id="my_first_svc-monitor-interval-120s" interval="120s"
>> > name="monitor"/>
>> >         </operations>
>> >       </primitive>
>> >       <primitive class="ocf" id="WebSite" provider="heartbeat"
>> > type="apache">
>> >         <instance_attributes id="WebSite-instance_attributes">
>> >           <nvpair id="WebSite-instance_attributes-configfile"
>> > name="configfile" value="/etc/httpd/conf/httpd.conf"/>
>> >           <nvpair id="WebSite-instance_attributes-statusurl"
>> > name="statusurl" value="http://localhost/server-status"/>
>>
>> >         </instance_attributes>
>> >         <operations>
>> >           <op id="WebSite-start-timeout-40s" interval="0s" name="start"
>> > timeout="40s" on-fail="restart"/>
>> >           <op id="WebSite-stop-timeout-60s" interval="0s" name="stop"
>> > timeout="60s" on-fail="restart"/>
>> >           <op id="WebSite-monitor-interval-1min" interval="1min"
>> > name="monitor" on-fail="restart"/>
>> >         </operations>
>> >         <meta_attributes id="WebSite-meta_attributes"/>
>> >       </primitive>
>> >     </resources>
>> >     <constraints>
>> >       <rsc_location id="location-WebSite-node2-50" node="node2"
>> > rsc="WebSite" score="50"/>
>> >   </constraints>
>> >     <rsc_defaults>
>> >       <meta_attributes id="rsc_defaults-options">
>> >         <nvpair id="rsc_defaults-options-migration-threshold"
>> > name="migration-threshold" value="1"/>
>> >       </meta_attributes>
>> >     </rsc_defaults>
>> >     <op_defaults>
>> >       <meta_attributes id="op_defaults-options">
>> >         <nvpair id="op_defaults-options-timeout" name="timeout"
>> > value="240s"/>
>> >       </meta_attributes>
>> >     </op_defaults>
>> >   </configuration>
>> > </cib>
>> >
>> > Once i stop the httpd service the pacemaker does not restarts it
>> > automatically.
>>
>> As mentioned, logs help a lot. The logs from all nodes starting before
>> you trigger the failure until after the logs stop printing please.
>>
>> Also, you must use stonith. Please configure and test it. Often problems
>> go away when stonith is configured and working properly.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> With Regards
> P.Vijay
>



-- 
With Regards
P.Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150727/676c7144/attachment-0003.html>


More information about the Users mailing list