[Pacemaker] pacemaker stonith No such device

Dvorak Andreas Andreas.Dvorak at baaderbank.de
Wed Jul 9 06:53:43 EDT 2014


Dear all,

unfortunately my stonith does not work on my pacemaker cluster. If I do ifdown on the two cluster interconnect interfaces of server sv2827 the server sv2828 want to fence the server sv2827, but the messages log says:    error: remote_op_done: Operation reboot of sv2827-p1 by sv2828-p1 for crmd.7979 at sv2828-p1.076062f0: No such device
Can somebody please help me?


Jul  9 12:42:49 sv2828 corosync[7749]:   [CMAN  ] quorum lost, blocking activity
Jul  9 12:42:49 sv2828 corosync[7749]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jul  9 12:42:49 sv2828 corosync[7749]:   [QUORUM] Members[1]: 1
Jul  9 12:42:49 sv2828 corosync[7749]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: cman_event_callback: Membership 1492: quorum lost
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: crm_update_peer_state: cman_event_callback: Node sv2827-p1[2] - state is now lost (was member)
Jul  9 12:42:49 sv2828 crmd[7979]:  warning: match_down_event: No match for shutdown action on sv2827-p1
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: peer_update_callback: Stonith/shutdown of sv2827-p1 not matched
Jul  9 12:42:49 sv2828 kernel: dlm: closing connection to node 2
Jul  9 12:42:49 sv2828 corosync[7749]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.2.28) r(1) ip(192.168.3.28) ; members(old:2 left:1)
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Jul  9 12:42:49 sv2828 corosync[7749]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  9 12:42:49 sv2828 crmd[7979]:  warning: match_down_event: No match for shutdown action on sv2827-p1
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: peer_update_callback: Stonith/shutdown of sv2827-p1 not matched
Jul  9 12:42:49 sv2828 attrd[7977]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jul  9 12:42:49 sv2828 attrd[7977]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-MYSQLFS (1404902183)
Jul  9 12:42:49 sv2828 attrd[7977]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jul  9 12:42:49 sv2828 attrd[7977]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-MYSQL (1404901921)
Jul  9 12:42:49 sv2828 pengine[7978]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jul  9 12:42:49 sv2828 pengine[7978]:  warning: pe_fence_node: Node sv2827-p1 will be fenced because the node is no longer part of the cluster
Jul  9 12:42:49 sv2828 pengine[7978]:  warning: determine_online_status: Node sv2827-p1 is unclean
Jul  9 12:42:49 sv2828 pengine[7978]:  warning: custom_action: Action ipmi-fencing-sv2828_stop_0 on sv2827-p1 is unrunnable (offline)
Jul  9 12:42:49 sv2828 pengine[7978]:  warning: stage6: Scheduling Node sv2827-p1 for STONITH
Jul  9 12:42:49 sv2828 pengine[7978]:   notice: LogActions: Move    ipmi-fencing-sv2828#011(Started sv2827-p1 -> sv2828-p1)
Jul  9 12:42:49 sv2828 pengine[7978]:  warning: process_pe_message: Calculated Transition 38: /var/lib/pacemaker/pengine/pe-warn-28.bz2
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: te_fence_node: Executing reboot fencing operation (20) on sv2827-p1 (timeout=60000)
Jul  9 12:42:49 sv2828 stonith-ng[7975]:   notice: handle_request: Client crmd.7979.6c35e3f1 wants to fence (reboot) 'sv2827-p1' with device '(any)'
Jul  9 12:42:49 sv2828 stonith-ng[7975]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for sv2827-p1: 076062f0-eff3-4798-a504-16c5c5666a5b (0)
Jul  9 12:42:49 sv2828 stonith-ng[7975]:   notice: can_fence_host_with_device: ipmi-fencing-sv2827 can not fence sv2827-p1: static-list
Jul  9 12:42:49 sv2828 stonith-ng[7975]:   notice: can_fence_host_with_device: ipmi-fencing-sv2828 can not fence sv2827-p1: static-list
Jul  9 12:42:49 sv2828 stonith-ng[7975]:    error: remote_op_done: Operation reboot of sv2827-p1 by sv2828-p1 for crmd.7979 at sv2828-p1.076062f0: No such device
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: tengine_stonith_callback: Stonith operation 8/20:38:0:71703806-8a7c-447f-a033-e3c26abd607c: No such device (-19)
Jul  9 12:42:49 sv2828 crmd[7979]:   notice: run_graph: Transition 38 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-28.bz2): Stopped

With the ipmitool I could test the correct work of the ipmi like power cycle.

pcs status
Cluster name: mysql-int-prod
Last updated: Wed Jul  9 12:46:43 2014
Last change: Wed Jul  9 12:41:14 2014 via crm_resource on sv2828-p1
Stack: cman
Current DC: sv2828-p1 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
5 Resources configured
Online: [ sv2827-p1 sv2828-p1 ]
Full list of resources:
ipmi-fencing-sv2827    (stonith:fence_ipmilan):            Started sv2828-p1
 ipmi-fencing-sv2828    (stonith:fence_ipmilan):            Started sv2827-p1
 ClusterIP            (ocf::heartbeat:IPaddr2):            Started sv2828-p1
 MYSQLFS            (ocf::baader:MYSQLFS):               Started sv2828-p1
 MYSQL (ocf::baader:MYSQL):    Started sv2828-p1


pcs property
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-1.el6_4.4-368c726
last-lrm-refresh: 1404902348
no-quorum-policy: ignore
stonith-enabled: true


Best regards
Andreas Dvorak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140709/1404b246/attachment-0002.html>


More information about the Pacemaker mailing list