[Pacemaker] When stonith is enabled, resources won't start until after stonith, even though requires="nothing" and prereq="nothing" on RHEL 7 with pacemaker-1.1.11 compiled from source.

Fri Jul 4 04:30:59 UTC 2014

On 2 Jul 2014, at 6:32 am, Paul E Cain <pecain at us.ibm.com> wrote:

> Hi Andrew,
> 
> Thanks for fixing that.
> 
> I downloaded the build from this patch (pacemaker-2a5bbf93cb1bfee4ff57425d4460122d0fba57ab.zip) and compiled it from source. This time ha3_fabric_ping tried to start as expected and then failed as expected. However, the stonith still occurred even though fencing_route_to_ha4 wasn't up and running. Looking at the logs, it looks like STONITH can still fence even if fencing_route_to_ha4 isn't running. Unless you have an other suggestions, I'm thinking the best solution is to add a few lines to the fence agent code to have it ping 10.10.0.1 before fencing, not fencing and returning a fail code if the it cannot ping 10.10.0.1. 

You forgot to configure startup-fencing=false this time ;-)

> 
> [root at ha3 crmsh-2.0.0]# crm_mon -1
> Last updated: Tue Jul  1 14:42:52 2014
> Last change: Tue Jul  1 14:33:06 2014
> Stack: corosync
> Current DC: ha3 (1) - partition WITHOUT quorum
> Version: 1.1.11-2a5bbf9
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Node ha4 (2): UNCLEAN (offline)
> Online: [ ha3 ]
> 
> 
> Failed actions:
>     ha3_fabric_ping_start_0 on ha3 'unknown error' (1): call=18, status=complete, last-rc-change='Tue Jul  1 14:36:53 2014', queued=0ms, exec=20027ms
> 
> 
> <cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="10" num_updates="15" admin_epoch="0" cib-last-written="Tue Jul  1 14:33:06 2014" have-quorum="0" dc-uuid="1">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair name="symmetric-cluster" value="true" id="cib-bootstrap-options-symmetric-cluster"/>
>         <nvpair name="stonith-enabled" value="true" id="cib-bootstrap-options-stonith-enabled"/>
>         <nvpair name="stonith-action" value="reboot" id="cib-bootstrap-options-stonith-action"/>
>         <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
>         <nvpair name="stop-orphan-resources" value="true" id="cib-bootstrap-options-stop-orphan-resources"/>
>         <nvpair name="stop-orphan-actions" value="true" id="cib-bootstrap-options-stop-orphan-actions"/>
>         <nvpair name="default-action-timeout" value="20s" id="cib-bootstrap-options-default-action-timeout"/>
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-2a5bbf9"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>         <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1404242903"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="1" uname="ha3"/>
>       <node id="2" uname="ha4"/>
>     </nodes>
>     <resources>
>       <primitive id="ha3_fabric_ping" class="ocf" provider="pacemaker" type="ping">
>         <instance_attributes id="ha3_fabric_ping-instance_attributes">
>           <nvpair name="host_list" value="10.10.0.1" id="ha3_fabric_ping-instance_attributes-host_list"/>
>           <nvpair name="failure_score" value="1" id="ha3_fabric_ping-instance_attributes-failure_score"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" timeout="60s" requires="nothing" interval="0" id="ha3_fabric_ping-start-0">
>             <instance_attributes id="ha3_fabric_ping-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha3_fabric_ping-monitor-15s">
>             <instance_attributes id="ha3_fabric_ping-monitor-15s-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-monitor-15s-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha3_fabric_ping-stop-0">
>             <instance_attributes id="ha3_fabric_ping-stop-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha3_fabric_ping-stop-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>       <primitive id="ha4_fabric_ping" class="ocf" provider="pacemaker" type="ping">
>         <instance_attributes id="ha4_fabric_ping-instance_attributes">
>           <nvpair name="host_list" value="10.10.0.1" id="ha4_fabric_ping-instance_attributes-host_list"/>
>           <nvpair name="failure_score" value="1" id="ha4_fabric_ping-instance_attributes-failure_score"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" timeout="60s" requires="nothing" interval="0" id="ha4_fabric_ping-start-0">
>             <instance_attributes id="ha4_fabric_ping-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" interval="15s" requires="nothing" timeout="15s" id="ha4_fabric_ping-monitor-15s">
>             <instance_attributes id="ha4_fabric_ping-monitor-15s-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-monitor-15s-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="stop" on-fail="fence" requires="nothing" interval="0" id="ha4_fabric_ping-stop-0">
>             <instance_attributes id="ha4_fabric_ping-stop-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="ha4_fabric_ping-stop-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>       <primitive id="fencing_route_to_ha3" class="stonith" type="meatware">
>         <instance_attributes id="fencing_route_to_ha3-instance_attributes">
>           <nvpair name="hostlist" value="ha3" id="fencing_route_to_ha3-instance_attributes-hostlist"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha3-start-0">
>             <instance_attributes id="fencing_route_to_ha3-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha3-monitor-0">
>             <instance_attributes id="fencing_route_to_ha3-monitor-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="fencing_route_to_ha3-monitor-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>       <primitive id="fencing_route_to_ha4" class="stonith" type="meatware">
>         <instance_attributes id="fencing_route_to_ha4-instance_attributes">
>           <nvpair name="hostlist" value="ha4" id="fencing_route_to_ha4-instance_attributes-hostlist"/>
>         </instance_attributes>
>         <operations>
>           <op name="start" requires="nothing" interval="0" id="fencing_route_to_ha4-start-0">
>             <instance_attributes id="fencing_route_to_ha4-start-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-start-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>           <op name="monitor" requires="nothing" interval="0" id="fencing_route_to_ha4-monitor-0">
>             <instance_attributes id="fencing_route_to_ha4-monitor-0-instance_attributes">
>               <nvpair name="prereq" value="nothing" id="fencing_route_to_ha4-monitor-0-instance_attributes-prereq"/>
>             </instance_attributes>
>           </op>
>         </operations>
>       </primitive>
>     </resources>
>     <constraints>
>       <rsc_location id="ha3_fabric_ping_location" rsc="ha3_fabric_ping" score="INFINITY" node="ha3"/>
>       <rsc_location id="ha3_fabric_ping_not_location" rsc="ha3_fabric_ping" score="-INFINITY" node="ha4"/>
>       <rsc_location id="ha4_fabric_ping_location" rsc="ha4_fabric_ping" score="INFINITY" node="ha4"/>
>       <rsc_location id="ha4_fabric_ping_not_location" rsc="ha4_fabric_ping" score="-INFINITY" node="ha3"/>
>       <rsc_location id="fencing_route_to_ha4_location" rsc="fencing_route_to_ha4" score="INFINITY" node="ha3"/>
>       <rsc_location id="fencing_route_to_ha4_not_location" rsc="fencing_route_to_ha4" score="-INFINITY" node="ha4"/>
>       <rsc_location id="fencing_route_to_ha3_location" rsc="fencing_route_to_ha3" score="INFINITY" node="ha4"/>
>       <rsc_location id="fencing_route_to_ha3_not_location" rsc="fencing_route_to_ha3" score="-INFINITY" node="ha3"/>
>       <rsc_order id="ha3_fabric_ping_before_fencing_route_to_ha4" score="INFINITY" first="ha3_fabric_ping" first-action="start" then="fencing_route_to_ha4" then-action="start"/>
>       <rsc_order id="ha4_fabric_ping_before_fencing_route_to_ha3" score="INFINITY" first="ha4_fabric_ping" first-action="start" then="fencing_route_to_ha3" then-action="start"/>
>     </constraints>
>   </configuration>
>   <status>
>     <node_state id="1" uname="ha3" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
>       <lrm id="1">
>         <lrm_resources>
>           <lrm_resource id="ha3_fabric_ping" type="ping" class="ocf" provider="pacemaker">
>             <lrm_rsc_op id="ha3_fabric_ping_last_0" operation_key="ha3_fabric_ping_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="1:1:0:81a6b215-3955-42b9-871b-9d127ef97e40" transition-magic="0:0;1:1:0:81a6b215-3955-42b9-871b-9d127ef97e40" call-id="19" rc-code="0" op-status="0" interval="0" last-run="1404243473" last-rc-change="1404243473" exec-time="63" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58" on_node="ha3"/>
>             <lrm_rsc_op id="ha3_fabric_ping_last_failure_0" operation_key="ha3_fabric_ping_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="8:0:0:81a6b215-3955-42b9-871b-9d127ef97e40" transition-magic="0:1;8:0:0:81a6b215-3955-42b9-871b-9d127ef97e40" call-id="18" rc-code="1" op-status="0" interval="0" last-run="1404243413" last-rc-change="1404243413" exec-time="20027" queue-time="0" op-digest="ddf4bee6852a62c7efcf52cf7471d629"/>
>           </lrm_resource>
>           <lrm_resource id="ha4_fabric_ping" type="ping" class="ocf" provider="pacemaker">
>             <lrm_rsc_op id="ha4_fabric_ping_last_0" operation_key="ha4_fabric_ping_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="5:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" transition-magic="0:7;5:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1404243413" last-rc-change="1404243413" exec-time="8" queue-time="0" op-digest="91b00b3fe95f23582466d18e42c4fd58" on_node="ha3"/>
>           </lrm_resource>
>           <lrm_resource id="fencing_route_to_ha3" type="meatware" class="stonith">
>             <lrm_rsc_op id="fencing_route_to_ha3_last_0" operation_key="fencing_route_to_ha3_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="6:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" transition-magic="0:7;6:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" call-id="13" rc-code="7" op-status="0" interval="0" last-run="1404243413" last-rc-change="1404243413" exec-time="1" queue-time="0" op-digest="502fbd7a2366c2be772d7fbecc9e0351" on_node="ha3"/>
>           </lrm_resource>
>           <lrm_resource id="fencing_route_to_ha4" type="meatware" class="stonith">
>             <lrm_rsc_op id="fencing_route_to_ha4_last_0" operation_key="fencing_route_to_ha4_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="7:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" transition-magic="0:7;7:0:7:81a6b215-3955-42b9-871b-9d127ef97e40" call-id="17" rc-code="7" op-status="0" interval="0" last-run="1404243413" last-rc-change="1404243413" exec-time="0" queue-time="0" op-digest="5be26fbcfd648e3d545d0115645dde76" on_node="ha3"/>
>           </lrm_resource>
>         </lrm_resources>
>       </lrm>
>       <transient_attributes id="1">
>         <instance_attributes id="status-1">
>           <nvpair id="status-1-shutdown" name="shutdown" value="0"/>
>           <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
>           <nvpair id="status-1-fail-count-ha3_fabric_ping" name="fail-count-ha3_fabric_ping" value="INFINITY"/>
>           <nvpair id="status-1-last-failure-ha3_fabric_ping" name="last-failure-ha3_fabric_ping" value="1404243433"/>
>         </instance_attributes>
>       </transient_attributes>
>     </node_state>
>   </status>
> </cib>
> 
> 
> (PS: I added extra logging to your patch that can be seen in this log file below)
>  if(action->needs == rsc_req_nothing) {
>                 crm_notice("%s needs nothing", action->uuid);
>         } else if (action->needs == rsc_req_stonith) {
>             crm_notice("%s needs stonith", action->uuid);
>             order_actions(stonith_done, action, pe_order_optional);
> 
> /var/log/messages 
> Jul  1 14:34:56 ha3 corosync[4638]: [QB    ] withdrawing server sockets
> Jul  1 14:34:56 ha3 corosync[4638]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
> Jul  1 14:34:56 ha3 corosync[4638]: [SERV  ] Service engine unloaded: corosync profile loading service
> Jul  1 14:34:56 ha3 corosync[4638]: [MAIN  ] Corosync Cluster Engine exiting normally
> Jul  1 14:34:57 ha3 corosync: Waiting for corosync services to unload:.[  OK  ]
> Jul  1 14:34:57 ha3 systemd: Stopped LSB: Starts and stops Corosync Cluster Engine..
> Jul  1 14:35:01 ha3 systemd-logind: Removed session 11.
> Jul  1 14:35:37 ha3 systemd: Starting Session 12 of user root.
> Jul  1 14:35:37 ha3 systemd: Started Session 12 of user root.
> Jul  1 14:35:37 ha3 systemd-logind: New session 12 of user root.
> Jul  1 14:36:24 ha3 systemd: Starting LSB: Starts and stops Corosync Cluster Engine....
> Jul  1 14:36:24 ha3 corosync[4924]: [MAIN  ] Corosync Cluster Engine ('2.3.3'): started and ready to provide service.
> Jul  1 14:36:24 ha3 corosync[4924]: [MAIN  ] Corosync built-in features: pie relro bindnow
> Jul  1 14:36:24 ha3 corosync[4925]: [TOTEM ] Initializing transport (UDP/IP Unicast).
> Jul  1 14:36:24 ha3 corosync[4925]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
> Jul  1 14:36:25 ha3 corosync[4925]: [TOTEM ] The network interface [10.10.0.14] is now up.
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync configuration map access [0]
> Jul  1 14:36:25 ha3 corosync[4925]: [QB    ] server name: cmap
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync configuration service [1]
> Jul  1 14:36:25 ha3 corosync[4925]: [QB    ] server name: cfg
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
> Jul  1 14:36:25 ha3 corosync[4925]: [QB    ] server name: cpg
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync profile loading service [4]
> Jul  1 14:36:25 ha3 corosync[4925]: [QUORUM] Using quorum provider corosync_votequorum
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
> Jul  1 14:36:25 ha3 corosync[4925]: [QB    ] server name: votequorum
> Jul  1 14:36:25 ha3 corosync[4925]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
> Jul  1 14:36:25 ha3 corosync[4925]: [QB    ] server name: quorum
> Jul  1 14:36:25 ha3 corosync[4925]: [TOTEM ] adding new UDPU member {10.10.0.14}
> Jul  1 14:36:25 ha3 corosync[4925]: [TOTEM ] adding new UDPU member {10.10.0.15}
> Jul  1 14:36:25 ha3 corosync[4925]: [TOTEM ] A new membership (10.10.0.14:2420) was formed. Members joined: 1
> Jul  1 14:36:25 ha3 corosync[4925]: [QUORUM] Members[1]: 1
> Jul  1 14:36:25 ha3 corosync[4925]: [MAIN  ] Completed service synchronization, ready to provide service.
> Jul  1 14:36:25 ha3 corosync: Starting Corosync Cluster Engine (corosync): [  OK  ]
> Jul  1 14:36:25 ha3 systemd: Started LSB: Starts and stops Corosync Cluster Engine..
> Jul  1 14:36:30 ha3 systemd: Starting LSB: Starts and stops Pacemaker Cluster Manager....
> Jul  1 14:36:30 ha3 pacemaker: Starting Pacemaker Cluster Manager
> Jul  1 14:36:30 ha3 pacemakerd[4953]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 pacemakerd[4953]: notice: mcp_read_config: Configured corosync to accept connections from group 1000: OK (1)
> Jul  1 14:36:30 ha3 pacemakerd[4953]: notice: main: Starting Pacemaker 1.1.11 (Build: 2a5bbf9):  agent-manpages ncurses libqb-logging libqb-ipc lha-fencing nagios  corosync-native libesmtp acls
> Jul  1 14:36:30 ha3 pacemakerd[4953]: notice: cluster_connect_quorum: Quorum lost
> Jul  1 14:36:30 ha3 pacemakerd[4953]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[1] - state is now member (was (null))
> Jul  1 14:36:30 ha3 crmd[4960]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 crmd[4960]: notice: main: CRM Git Version: 2a5bbf9
> Jul  1 14:36:30 ha3 crmd[4960]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jul  1 14:36:30 ha3 crmd[4960]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jul  1 14:36:30 ha3 stonith-ng[4956]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 stonith-ng[4956]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jul  1 14:36:30 ha3 lrmd[4957]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 attrd[4958]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 attrd[4958]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jul  1 14:36:30 ha3 cib[4955]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 cib[4955]: warning: crm_is_writable: /var/lib/pacemaker/cib should be owned and r/w by group haclient
> Jul  1 14:36:30 ha3 cib[4955]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jul  1 14:36:30 ha3 pengine[4959]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
> Jul  1 14:36:30 ha3 pengine[4959]: warning: crm_is_writable: /var/lib/pacemaker/pengine should be owned and r/w by group haclient
> Jul  1 14:36:30 ha3 attrd[4958]: notice: crm_update_peer_state: attrd_peer_change_cb: Node ha3[1] - state is now member (was (null))
> Jul  1 14:36:31 ha3 crmd[4960]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jul  1 14:36:31 ha3 crmd[4960]: notice: cluster_connect_quorum: Quorum lost
> Jul  1 14:36:31 ha3 crmd[4960]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[1] - state is now member (was (null))
> Jul  1 14:36:31 ha3 crmd[4960]: notice: do_started: The local CRM is operational
> Jul  1 14:36:31 ha3 crmd[4960]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jul  1 14:36:31 ha3 stonith-ng[4956]: notice: setup_cib: Watching for stonith topology changes
> Jul  1 14:36:31 ha3 stonith-ng[4956]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:36:32 ha3 stonith-ng[4956]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> Jul  1 14:36:35 ha3 pacemaker: Starting Pacemaker Cluster Manager[  OK  ]
> Jul  1 14:36:35 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> Jul  1 14:36:52 ha3 crmd[4960]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jul  1 14:36:52 ha3 crmd[4960]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jul  1 14:36:52 ha3 crmd[4960]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jul  1 14:36:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:36:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: ha3_fabric_ping_monitor_15000 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: ha3_fabric_ping_start_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: ha3_fabric_ping_monitor_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: ha4_fabric_ping_monitor_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha3_monitor_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_monitor_0 needs nothing
> Jul  1 14:36:53 ha3 pengine[4959]: notice: LogActions: Start   ha3_fabric_ping	(ha3)
> Jul  1 14:36:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> Jul  1 14:36:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:36:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 3eb51036-6c4a-40ee-a0dc-bc1838cf13df (0)
> Jul  1 14:36:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-warn-202.bz2
> Jul  1 14:36:53 ha3 stonith: [4972]: info: parse config info info=ha4
> Jul  1 14:36:53 ha3 stonith: [4972]: info: meatware device OK.
> Jul  1 14:36:53 ha3 stonith: [4977]: info: parse config info info=ha4
> Jul  1 14:36:53 ha3 stonith: [4977]: info: meatware device OK.
> Jul  1 14:36:53 ha3 stonith: [4983]: info: parse config info info=ha4
> Jul  1 14:36:53 ha3 stonith: [4983]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:36:53 ha3 stonith: [4983]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:36:53 ha3 crmd[4960]: notice: process_lrm_event: Operation ha3_fabric_ping_monitor_0: not running (node=ha3, call=5, rc=7, cib-update=25, confirmed=true)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: process_lrm_event: Operation ha4_fabric_ping_monitor_0: not running (node=ha3, call=9, rc=7, cib-update=26, confirmed=true)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: process_lrm_event: Operation fencing_route_to_ha3_monitor_0: not running (node=ha3, call=13, rc=7, cib-update=27, confirmed=true)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: process_lrm_event: Operation fencing_route_to_ha4_monitor_0: not running (node=ha3, call=17, rc=7, cib-update=28, confirmed=true)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete-ha3 on ha3 (local) - no waiting
> Jul  1 14:36:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 8: start ha3_fabric_ping_start_0 on ha3 (local)
> Jul  1 14:36:53 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted by status-1-probe_complete, probe_complete=true: Transient attribute change (create cib=0.10.9, source=te_update_diff:391, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'], 0)
> Jul  1 14:37:13 ha3 ping(ha3_fabric_ping)[5003]: WARNING: pingd is less than failure_score(1)
> Jul  1 14:37:13 ha3 crmd[4960]: notice: process_lrm_event: Operation ha3_fabric_ping_start_0: unknown error (node=ha3, call=18, rc=1, cib-update=29, confirmed=true)
> Jul  1 14:37:13 ha3 crmd[4960]: warning: status_from_rc: Action 8 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jul  1 14:37:13 ha3 crmd[4960]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1404243433)
> Jul  1 14:37:13 ha3 crmd[4960]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1404243433)
> Jul  1 14:37:13 ha3 crmd[4960]: warning: status_from_rc: Action 8 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> Jul  1 14:37:13 ha3 crmd[4960]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1404243433)
> Jul  1 14:37:13 ha3 crmd[4960]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1404243433)
> Jul  1 14:37:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 4979 performing action 'reboot' timed out with signal 15
> Jul  1 14:37:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [4979] (call 2 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:37:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:4979 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:37:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:37:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.3eb51036: Timer expired
> Jul  1 14:37:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 2/12:0:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:37:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:37:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=3eb51036-6c4a-40ee-a0dc-bc1838cf13df) by client crmd.4960
> Jul  1 14:37:53 ha3 crmd[4960]: notice: run_graph: Transition 0 (Complete=8, Pending=0, Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-202.bz2): Stopped
> Jul  1 14:37:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:37:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:37:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:37:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:37:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:37:53 ha3 pengine[4959]: notice: native_start_constraints: ha3_fabric_ping_stop_0 needs nothing
> Jul  1 14:37:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:37:53 ha3 pengine[4959]: notice: LogActions: Stop    ha3_fabric_ping	(ha3)
> Jul  1 14:37:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:37:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-203.bz2
> Jul  1 14:37:53 ha3 crmd[4960]: notice: te_rsc_command: Initiating action 1: stop ha3_fabric_ping_stop_0 on ha3 (local)
> Jul  1 14:37:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (7) on ha4 (timeout=60000)
> Jul  1 14:37:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:37:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: a52da012-efbc-448b-843a-9f85d828b9af (0)
> Jul  1 14:37:53 ha3 stonith: [5040]: info: parse config info info=ha4
> Jul  1 14:37:53 ha3 stonith: [5040]: info: meatware device OK.
> Jul  1 14:37:53 ha3 stonith: [5045]: info: parse config info info=ha4
> Jul  1 14:37:53 ha3 stonith: [5045]: info: meatware device OK.
> Jul  1 14:37:53 ha3 stonith: [5051]: info: parse config info info=ha4
> Jul  1 14:37:53 ha3 stonith: [5051]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:37:53 ha3 stonith: [5051]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:37:53 ha3 crmd[4960]: notice: process_lrm_event: Operation ha3_fabric_ping_stop_0: ok (node=ha3, call=19, rc=0, cib-update=31, confirmed=true)
> Jul  1 14:37:58 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted by deletion of nvpair[@id='status-1-pingd']: Transient attribute change (cib=0.10.15, source=te_update_diff:391, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pingd'], 0)
> Jul  1 14:38:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 5047 performing action 'reboot' timed out with signal 15
> Jul  1 14:38:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [5047] (call 3 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:38:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:5047 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:38:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:38:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.a52da012: Timer expired
> Jul  1 14:38:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 3/7:1:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:38:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 3 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:38:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=a52da012-efbc-448b-843a-9f85d828b9af) by client crmd.4960
> Jul  1 14:38:53 ha3 crmd[4960]: notice: run_graph: Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-203.bz2): Stopped
> Jul  1 14:38:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:38:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:38:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:38:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:38:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:38:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:38:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (6) on ha4 (timeout=60000)
> Jul  1 14:38:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:38:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 7d74f7e7-354c-4aed-805e-376d78a268d6 (0)
> Jul  1 14:38:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-204.bz2
> Jul  1 14:38:53 ha3 stonith: [5056]: info: parse config info info=ha4
> Jul  1 14:38:53 ha3 stonith: [5056]: info: meatware device OK.
> Jul  1 14:38:53 ha3 stonith: [5058]: info: parse config info info=ha4
> Jul  1 14:38:53 ha3 stonith: [5058]: info: meatware device OK.
> Jul  1 14:38:53 ha3 stonith: [5060]: info: parse config info info=ha4
> Jul  1 14:38:53 ha3 stonith: [5060]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:38:53 ha3 stonith: [5060]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:39:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 5059 performing action 'reboot' timed out with signal 15
> Jul  1 14:39:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [5059] (call 4 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:39:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:5059 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:39:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:39:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.7d74f7e7: Timer expired
> Jul  1 14:39:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 4/6:2:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:39:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 4 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:39:53 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> Jul  1 14:39:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=7d74f7e7-354c-4aed-805e-376d78a268d6) by client crmd.4960
> Jul  1 14:39:53 ha3 crmd[4960]: notice: run_graph: Transition 2 (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-204.bz2): Stopped
> Jul  1 14:39:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:39:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:39:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:39:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:39:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:39:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:39:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-warn-204.bz2
> Jul  1 14:39:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (6) on ha4 (timeout=60000)
> Jul  1 14:39:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:39:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 2788d6bb-ac17-450c-beba-10944495a476 (0)
> Jul  1 14:39:53 ha3 stonith: [5062]: info: parse config info info=ha4
> Jul  1 14:39:53 ha3 stonith: [5062]: info: meatware device OK.
> Jul  1 14:39:53 ha3 stonith: [5064]: info: parse config info info=ha4
> Jul  1 14:39:53 ha3 stonith: [5064]: info: meatware device OK.
> Jul  1 14:39:53 ha3 stonith: [5066]: info: parse config info info=ha4
> Jul  1 14:39:53 ha3 stonith: [5066]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:39:53 ha3 stonith: [5066]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:40:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 5065 performing action 'reboot' timed out with signal 15
> Jul  1 14:40:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [5065] (call 5 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:40:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:5065 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:40:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:40:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.2788d6bb: Timer expired
> Jul  1 14:40:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 5/6:3:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:40:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 5 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:40:53 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> Jul  1 14:40:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=2788d6bb-ac17-450c-beba-10944495a476) by client crmd.4960
> Jul  1 14:40:53 ha3 crmd[4960]: notice: run_graph: Transition 3 (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-204.bz2): Stopped
> Jul  1 14:40:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:40:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:40:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:40:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:40:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:40:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:40:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-warn-204.bz2
> Jul  1 14:40:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (6) on ha4 (timeout=60000)
> Jul  1 14:40:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:40:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: f854d478-2620-4662-bd78-068921d554c2 (0)
> Jul  1 14:40:53 ha3 stonith: [5068]: info: parse config info info=ha4
> Jul  1 14:40:53 ha3 stonith: [5068]: info: meatware device OK.
> Jul  1 14:40:53 ha3 stonith: [5070]: info: parse config info info=ha4
> Jul  1 14:40:53 ha3 stonith: [5070]: info: meatware device OK.
> Jul  1 14:40:53 ha3 stonith: [5072]: info: parse config info info=ha4
> Jul  1 14:40:53 ha3 stonith: [5072]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:40:53 ha3 stonith: [5072]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:41:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 5071 performing action 'reboot' timed out with signal 15
> Jul  1 14:41:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [5071] (call 6 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:41:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:5071 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:41:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:41:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.f854d478: Timer expired
> Jul  1 14:41:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 6/6:4:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:41:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 6 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:41:53 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> Jul  1 14:41:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=f854d478-2620-4662-bd78-068921d554c2) by client crmd.4960
> Jul  1 14:41:53 ha3 crmd[4960]: notice: run_graph: Transition 4 (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-204.bz2): Stopped
> Jul  1 14:41:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:41:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:41:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:41:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:41:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:41:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:41:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-warn-204.bz2
> Jul  1 14:41:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (6) on ha4 (timeout=60000)
> Jul  1 14:41:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:41:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: 4670e736-4d12-4ebf-a3f4-3c267384bbec (0)
> Jul  1 14:41:53 ha3 stonith: [5075]: info: parse config info info=ha4
> Jul  1 14:41:53 ha3 stonith: [5075]: info: meatware device OK.
> Jul  1 14:41:53 ha3 stonith: [5077]: info: parse config info info=ha4
> Jul  1 14:41:53 ha3 stonith: [5077]: info: meatware device OK.
> Jul  1 14:41:53 ha3 stonith: [5079]: info: parse config info info=ha4
> Jul  1 14:41:53 ha3 stonith: [5079]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:41:53 ha3 stonith: [5079]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> Jul  1 14:42:53 ha3 stonith-ng[4956]: notice: stonith_action_async_done: Child process 5078 performing action 'reboot' timed out with signal 15
> Jul  1 14:42:53 ha3 stonith-ng[4956]: error: log_operation: Operation 'reboot' [5078] (call 7 from crmd.4960) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> Jul  1 14:42:53 ha3 stonith-ng[4956]: warning: log_operation: fencing_route_to_ha4:5078 [ Performing: stonith -t meatware -T reset ha4 ]
> Jul  1 14:42:53 ha3 stonith-ng[4956]: warning: get_xpath_object: No match for //@st_delegate in /st-reply
> Jul  1 14:42:53 ha3 stonith-ng[4956]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.4960 at ha3.4670e736: Timer expired
> Jul  1 14:42:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 7/6:5:0:81a6b215-3955-42b9-871b-9d127ef97e40: Timer expired (-62)
> Jul  1 14:42:53 ha3 crmd[4960]: notice: tengine_stonith_callback: Stonith operation 7 for ha4 failed (Timer expired): aborting transition.
> Jul  1 14:42:53 ha3 crmd[4960]: notice: abort_transition_graph: Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> Jul  1 14:42:53 ha3 crmd[4960]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: Timer expired (ref=4670e736-4d12-4ebf-a3f4-3c267384bbec) by client crmd.4960
> Jul  1 14:42:53 ha3 crmd[4960]: notice: run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-204.bz2): Stopped
> Jul  1 14:42:53 ha3 pengine[4959]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jul  1 14:42:53 ha3 pengine[4959]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> Jul  1 14:42:53 ha3 pengine[4959]: warning: common_apply_stickiness: Forcing ha3_fabric_ping away from ha3 after 1000000 failures (max=1000000)
> Jul  1 14:42:53 ha3 pengine[4959]: warning: stage6: Scheduling Node ha4 for STONITH
> Jul  1 14:42:53 ha3 pengine[4959]: notice: native_start_constraints: fencing_route_to_ha4_start_0 needs nothing
> Jul  1 14:42:53 ha3 pengine[4959]: notice: LogActions: Start   fencing_route_to_ha4	(ha3 - blocked)
> Jul  1 14:42:53 ha3 pengine[4959]: warning: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-warn-204.bz2
> Jul  1 14:42:53 ha3 crmd[4960]: notice: te_fence_node: Executing reboot fencing operation (6) on ha4 (timeout=60000)
> Jul  1 14:42:53 ha3 stonith-ng[4956]: notice: handle_request: Client crmd.4960.55d3ab19 wants to fence (reboot) 'ha4' with device '(any)'
> Jul  1 14:42:53 ha3 stonith-ng[4956]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: dff125ee-e64f-4c23-80d2-ea38f1bc3437 (0)
> Jul  1 14:42:53 ha3 stonith: [5083]: info: parse config info info=ha4
> Jul  1 14:42:53 ha3 stonith: [5083]: info: meatware device OK.
> Jul  1 14:42:53 ha3 stonith: [5085]: info: parse config info info=ha4
> Jul  1 14:42:53 ha3 stonith: [5085]: info: meatware device OK.
> Jul  1 14:42:53 ha3 stonith: [5087]: info: parse config info info=ha4
> Jul  1 14:42:53 ha3 stonith: [5087]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> Jul  1 14:42:53 ha3 stonith: [5087]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> 
> 
> Paul Cain
> 
> <graycol.gif>Andrew Beekhof ---06/27/2014 05:13:24 AM---From: Andrew Beekhof <andrew at beekhof.net> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> 
> From:	Andrew Beekhof <andrew at beekhof.net>
> To:	The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Date:	06/27/2014 05:13 AM
> Subject:	Re: [Pacemaker] When stonith is enabled,	resources won't start until after stonith,	even though requires="nothing" and prereq="nothing" on RHEL	7	with	pacemaker-1.1.11 compiled from source.
> 
> 
> 
> 
> On 14 Jun 2014, at 7:37 am, Paul E Cain <pecain at us.ibm.com> wrote:
> 
> > Hi Andrew,
> > 
> > Thank you for your quick response. This time, I completely shut down ha4 and then started corosync and pacemaker on ha3. However, the problem still persisted. It's my understanding that using requires="nothing" or prereq="nothing" should allow the cluster to start resources without needing to fence. Is this not correct?
> 
> Apparently not without this patch:
>  https://github.com/ClusterLabs/pacemaker/commit/2a5bbf9
> > > > Jun 11 12:59:03 ha3 stonith-ng[5010]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: cluster_connect_quorum: Quorum acquired
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node ha3[168427534] - state is now member (was (null))
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427535
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Could not obtain a node name for corosync nodeid 168427535
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[168427535] - state is now member (was (null))
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_started: The local CRM is operational
> > > > Jun 11 12:59:03 ha3 crmd[5014]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> > > > Jun 11 12:59:04 ha3 stonith-ng[5010]: notice: stonith_device_register: Added 'fencing_route_to_ha4' to the device list (1 active devices)
> > > > Jun 11 12:59:06 ha3 pacemaker: Starting Pacemaker Cluster Manager[  OK  ]
> > > > Jun 11 12:59:06 ha3 systemd: Started LSB: Starts and stops Pacemaker Cluster Manager..
> > > > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> > > > Jun 11 12:59:24 ha3 crmd[5014]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]
> > > > Jun 11 12:59:24 ha3 crmd[5014]: warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> > > > Jun 11 12:59:24 ha3 cib[5009]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > > Jun 11 12:59:24 ha3 cib[5009]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 2 with 1 changes for terminate, id=<n/a>, set=(null)
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: write_attribute: Sent update 3 with 1 changes for shutdown, id=<n/a>, set=(null)
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 2 for terminate[ha3]=(null): OK (0)
> > > > Jun 11 12:59:24 ha3 attrd[5012]: notice: attrd_cib_callback: Update 3 for shutdown[ha3]=0: OK (0)
> > > > Jun 11 12:59:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 12:59:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > > > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start   ha3_fabric_ping	   (ha3)
> > > > Jun 11 12:59:25 ha3 pengine[5013]: notice: LogActions: Start   fencing_route_to_ha4	   (ha3)
> > > > Jun 11 12:59:25 ha3 pengine[5013]: warning: process_pe_message: Calc ulated Transition 0: /var/lib/pacemaker/pengine/pe-warn-80.bz2
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: monitor ha3_fabric_ping_monitor_0 on ha3 (local)
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (12) on ha4 (timeout=60000)
> > > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: b3ab6141-9612-4024-82b2-350e74bbb33d (0)
> > > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: corosync_node_name: Unable to get node name for nodeid 168427534
> > > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
> > > > Jun 11 12:59:25 ha3 stonith: [5027]: info: parse config info info=ha4
> > > > Jun 11 12:59:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > > Jun 11 12:59:25 ha3 stonith: [5031]: info: parse config info info=ha4
> > > > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > > > Jun 11 12:59:25 ha3 stonith: [5031]: CRIT: Run "meatclient -c ha4" AFTER power-cycling the machine.
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_monitor_0 (call=5, rc=7, cib-update=25, confirmed=true) not running
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 5: monitor ha4_fabric_ping_monitor_0 on ha3 (local)
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha4_fabric_ping_monitor_0 (call=9, rc=7, cib-update=26, confirmed=true) not running
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 6: monitor fencing_route_to_ha3_monitor_0 on ha3 (local)
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 7: monitor fencing_route_to_ha4_monitor_0 on ha3 (local)
> > > > Jun 11 12:59:25 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on ha3 (local) - no waiting
> > > > Jun 11 12:59:25 ha3 attrd[5012]: notice: write_attribute: Sent update 4 with 1 changes for probe_complete, id=<n/a>, set=(null)
> > > > Jun 11 12:59:25 ha3 attrd[5012]: notice: attrd_cib_callback: Update 4 for probe_complete[ha3]=true: OK (0)
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_action_async_done: Child process 5030 performing action 'reboot' timed out with signal 15
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: log_operation: Operation 'reboot' [5030] (call 2 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: -62 (Timer expired)
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: warning: log_operation: fencing_route_to_ha4:5030 [ Performing: stonith -t meatware -T reset ha4 ]
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: stonith_choose_peer: Couldn't find anyone to fence ha4 with <any>
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: error: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.b3ab6141: No route to host
> > > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2/12:0:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: No route to host (-113)
> > > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 2 for ha4 failed (No route to host): aborting transition.
> > > > Jun 11 13:00:25 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was not terminated (reboot) by ha3 for ha3: No route to host (ref=b3ab6141-9612-4024-82b2-350e74bbb33d) by client crmd.5014
> > > > Jun 11 13:00:25 ha3 crmd[5014]: notice: run_graph: Transition 0 (Complete=7, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-80.bz2): Stopped
> > > > Jun 11 13:00:25 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 13:00:25 ha3 pengine[5013]: warning: stage6: Scheduling Node ha4 for STONITH
> > > > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start   ha3_fabric_ping	   (ha3)
> > > > Jun 11 13:00:25 ha3 pengine[5013]: notice: LogActions: Start   fencing_route_to_ha4	   (ha3)
> > > > Jun 11 13:00:25 ha3 pengine[5013]: warning: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-warn-81.bz2
> > > > Jun 11 13:00:25 ha3 crmd[5014]: notice: te_fence_node: Executing reboot fencing operation (8) on ha4 (timeout=60000)
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: handle_request: Client crmd.5014.dbbbf194 wants to fence (reboot) 'ha4' with device '(any)'
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for ha4: eae78d4c-8d80-47fe-93e9-1a9261ec38a4 (0)
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > > Jun 11 13:00:25 ha3 stonith-ng[5010]: notice: can_fence_host_with_device: fencing_route_to_ha4 can fence ha4: dynamic-list
> > > > Jun 11 13:00:25 ha3 stonith: [5057]: info: parse config info info=ha4
> > > > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: OPERATOR INTERVENTION REQUIRED to reset ha4.
> > > > Jun 11 13:00:25 ha3 stonith: [5057]: CRIT: Run "meatclient -c ha4" ; AFTER power-cycling the machine.
> > > > Jun 11 13:00:41 ha3 stonith: [5057]: info: node Meatware-reset: ha4
> > > > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: log_operation: Operation 'reboot' [5056] (call 3 from crmd.5014) for host 'ha4' with device 'fencing_route_to_ha4' returned: 0 (OK)
> > > > Jun 11 13:00:41 ha3 stonith-ng[5010]: notice: remote_op_done: Operation reboot of ha4 by ha3 for crmd.5014 at ha3.eae78d4c: OK
> > > > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_callback: Stonith operation 3/8:1:0:0ebf14dc-cfcf-425a-a507-65ed0ee060aa: OK (0)
> > > > Jun 11 13:00:41 ha3 crmd[5014]: notice: crm_update_peer_state: send_stonith_update: Node ha4[0] - state is now lost (was (null))
> > > > Jun 11 13:00:41 ha3 crmd[5014]: notice: tengine_stonith_notify: Peer ha4 was terminated (reboot) by ha3 for ha3: OK (ref=eae78d4c-8d80-47fe-93e9-1a9261ec38a4) by client crmd.5014
> > > > Jun 11 13:00:41 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: start ha3_fabric_ping_start_0 on ha3 (local)
> > > > Jun 11 13:01:01 ha3 systemd: Starting Session 22 of user root.
> > > > Jun 11 13:01:01 ha3 systemd: Started Session 22 of user root.
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 5 with 1 changes for pingd, id=<n/a>, set=(null)
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 5 for pingd[ha3]=0: OK (0)
> > > > Jun 11 13:01:01 ha3 ping(ha3_fabric_ping)[5060]: WARNING: pingd is less than failure_score(1)
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_start_0 (call=18, rc=1, cib-update=37, confirmed=true) unknown error
> > > > Jun 11 13:01:01 ha3 crmd[5014]: warning: status_from_rc: Action 4 (ha3_fabric_ping_start_0) on ha3 failed (target: 0 vs. rc: 1): Error
> > > > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > > > Jun 11 13:01:01 ha3 crmd[5014]: warning: update_failcount: Updating failcount for ha3_fabric_ping on ha3 after failed start: rc=1 (update=INFINITY, time=1402509661)
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 1 (Complete=4, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-81.bz2): Stopped
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 6 with 1 changes for fail-count-ha3_fabric_ping, id=<n/a>, set=(null)
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: write_attribute: Sent update 7 with 1 changes for last-failure-ha3_fabric_ping, id=<n/a>, set=(null)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop    ha3_fabric_ping	   (ha3)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-304.bz2
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 6 for fail-count-ha3_fabric_ping[ha3]=INFINITY: OK (0)
> > > > Jun 11 13:01:01 ha3 attrd[5012]: notice: attrd_cib_callback: Update 7 for last-failure-ha3_fabric_ping[ha3]=1402509661: OK (0)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 13:01:01 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: LogActions: Stop    ha3_fabric_ping	   (ha3)
> > > > Jun 11 13:01:01 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-305.bz2
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: te_rsc_command: Initiating action 4: stop ha3_fabric_ping_stop_0 on ha3 (local)
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: process_lrm_event: LRM operation ha3_fabric_ping_stop_0 (call=19, rc=0, cib-update=41, confirmed=true) ok
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: run_graph: Transition 3 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-305.bz2): Complete
> > > > Jun 11 13:01:01 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > > > Jun 11 13:01:06 ha3 attrd[5012]: notice: write_attribute: Sent update 8 with 1 changes for pingd, id=<n/a>, set=(null)
> > > > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > > > Jun 11 13:01:06 ha3 pengine[5013]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > > Jun 11 13:01:06 ha3 pengine[5013]: warning: unpack_rsc_op_failure: Processing failed op start for ha3_fabric_ping on ha3: unknown error (1)
> > > > Jun 11 13:01:06 ha3 pengine[5013]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-306.bz2
> > > > Jun 11 13:01:06 ha3 crmd[5014]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-306.bz2): Complete
> > > > Jun 11 13:01:06 ha3 crmd[5014]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > > > Jun 11 13:01:06 ha3 attrd[5012]: notice: attrd_cib_callback: Update 8 for pingd[ha3]=(null): OK (0)
> > > > 
> > > > /etc/corosync/corosync.conf
> > > > # Please read the corosync.conf.5 manual page
> > > > totem {
> > > > version: 2
> > > > 
> > > > crypto_cipher: none
> > > > crypto_hash: none
> > > > 
> > > > interface {
> > > > ringnumber: 0
> > > > bindnetaddr: 10.10.0.0
> > > > mcastport: 5405
> > > > ttl: 1
> > > > }
> > > > transport: udpu
> > > > }
> > > > 
> > > > logging {
> > > > fileline: off
> > > > to_logfile: no
> > > > to_syslog: yes
> > > > #logfile: /var/log/cluster/corosync.log
> > > > debug: off
> > > > timestamp: on
> > > > logger_subsys {
> > > > subsys: QUORUM
> > > > debug: off
> > > > }
> > > > }
> > > > 
> > > > nodelist {
> > > > node {
> > > > ring0_addr: 10.10.0.14
> > > > }
> > > > 
> > > > node {
> > > > ring0_addr: 10.10.0.15
> > > > }
> > > > }
> > > > 
> > > > quorum {
> > > > # Enable and configure quorum subsystem (default: off)
> > > > # see also corosync.conf.5 and votequorum.5
> > > > provider: corosync_votequorum
> > > > expected_votes: 2
> > > > }
> > > > [root at ha3 ~]# 
> > > > 
> > > > Paul Cain
> > > > 
> > > > _______________________________________________
> > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > 
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > > 
> > > [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > > 
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > 
> > [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> [attachment "signature.asc" deleted by Paul E Cain/Lenexa/IBM] _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140704/cae9e476/attachment-0004.sig>