[Pacemaker] Stonith issue with fence_virsh

Beo Banks beo.banks at googlemail.com
Wed Oct 23 04:57:28 EDT 2013


*hi,


i wants to testing the fail-over capabilities of my cluster.
i run pkill -9 corosync on 2nd node and i saw on the 1node that he wants to
stonith the node2 but he "giving up after too many failures to fence node"



via commandline it works without any problems
fence_virsh -a host2 -l root -x -k /root/.ssh/id_rsa -o reboot -v -n
zarafa02


**setup
2x kvm guest (zarafa01=node1 / zarafa02=node2)
2x kvm host
rhel 6.4
pacemaker,corosync,drbd*
*
*
*hopefully somebody can help me with the issue and the 2nd issue after run
the fence_virsh via commandline the pacemaker service isn´t up on the 2nd
node.
*
*

node1/var/log/messages
Oct 23 09:35:28 zarafa01 pengine[2866]:  warning: stage6: Scheduling Node
zarafa02for STONITH
Oct 23 09:35:28 zarafa01 pengine[2866]:   notice: LogActions: Stop
drbd_mysql:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]:   notice: LogActions: Stop
drbd_zarafa:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]:   notice: LogActions: Stop
apache:1#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]:   notice: LogActions: Stop
stonith-zarafa01#011(zarafa02)
Oct 23 09:35:28 zarafa01 pengine[2866]:  warning: process_pe_message:
Calculated Transition 183: (null)
Oct 23 09:35:28 zarafa01 crmd[29263]:   notice: te_fence_node: Executing
reboot fencing operation (124) on zarafa02 (timeout=60000)
Oct 23 09:35:28 zarafa01 stonith-ng[2863]:   notice: handle_request: Client
crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)'
Oct 23 09:35:28 zarafa01 stonith-ng[2863]:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
zarafa02: 88604a94-8e2e-4ce4-9d08-85559e339f8e (0)
Oct 23 09:35:28 zarafa01 crmd[29263]:   notice: process_lrm_event: LRM
operation drbd_mysql_notify_0 (call=710, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:35:28 zarafa01 crmd[29263]:   notice: process_lrm_event: LRM
operation drbd_zarafa_notify_0 (call=712, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:36:40 zarafa01 stonith-ng[2863]:    error: remote_op_done:
Operation reboot of zarafa02 by zarafa01 for crmd.29263 at zarafa01.88604a94:
Timer expired
Oct 23 09:36:40 zarafa01 crmd[29263]:   notice: tengine_stonith_callback:
Stonith operation 5/124:183:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer
expired (-62)
Oct 23 09:36:40 zarafa01 crmd[29263]:   notice: tengine_stonith_callback:
Stonith operation 5 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:36:40 zarafa01 crmd[29263]:   notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=88604a94-8e2e-4ce4-9d08-85559e339f8e) by
client crmd.29263
Oct 23 09:36:40 zarafa01 crmd[29263]:   notice: run_graph: Transition 183
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:36:40 zarafa01 pengine[2866]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Oct 23 09:36:40 zarafa01 pengine[2866]:  warning: pe_fence_node: Node
zarafa02 will be fenced because the node is no longer part of the cluster
Oct 23 09:36:40 zarafa01 pengine[2866]:  warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:37:52 zarafa01 crmd[29263]:   notice: tengine_stonith_callback:
Stonith operation 6 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:37:52 zarafa01 crmd[29263]:   notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=b13b2562-4124-4e6c-acca-e1114f7d9b98) by
client crmd.29263
Oct 23 09:37:52 zarafa01 crmd[29263]:   notice: run_graph: Transition 184
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:37:52 zarafa01 pengine[2866]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Oct 23 09:37:52 zarafa01 pengine[2866]:  warning: pe_fence_node: Node
zarafa02 will be fenced because the node is no longer part of the cluster
Oct 23 09:37:52 zarafa01 pengine[2866]:  warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: determine_online_status:
Node zarafa02 is unclean
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
apache:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
apache:1_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:39:04 zarafa01 pengine[2866]:  warning: custom_action: Action
stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline)
Oct 23 09:43:52 zarafa01 pengine[2866]:   notice: LogActions: Stop
apache:1#011(zarafa02)
Oct 23 09:43:52 zarafa01 pengine[2866]:   notice: LogActions: Stop
stonith-zarafa01#011(zarafa02)
Oct 23 09:43:52 zarafa01 crmd[29263]:   notice: te_fence_node: Executing
reboot fencing operation (124) on zarafa02 (timeout=60000)
Oct 23 09:43:52 zarafa01 pengine[2866]:  warning: process_pe_message:
Calculated Transition 190: (null)
Oct 23 09:43:52 zarafa01 stonith-ng[2863]:   notice: handle_request: Client
crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)'
Oct 23 09:43:52 zarafa01 stonith-ng[2863]:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
zarafa02: de24f595-81e3-49f5-8886-07c8c1b22ec7 (0)
Oct 23 09:43:52 zarafa01 crmd[29263]:   notice: process_lrm_event: LRM
operation drbd_mysql_notify_0 (call=752, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:43:52 zarafa01 crmd[29263]:   notice: process_lrm_event: LRM
operation drbd_zarafa_notify_0 (call=754, rc=0, cib-update=0,
confirmed=true) ok
Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock lost 92458 messages from
pid 1927 due to rate-limiting
Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock begins to drop messages
from pid 1927 due to rate-limiting
Oct 23 09:45:02 zarafa01 rsyslogd-2177: imuxsock lost 13836 messages from
pid 1927 due to rate-limiting
Oct 23 09:45:03 zarafa01 rsyslogd-2177: imuxsock begins to drop messages
from pid 1927 due to rate-limiting
Oct 23 09:45:04 zarafa01 stonith-ng[2863]:    error: remote_op_done:
Operation reboot of zarafa02 by zarafa01 for crmd.29263 at zarafa01.de24f595:
Timer expired
Oct 23 09:45:04 zarafa01 crmd[29263]:   notice: tengine_stonith_callback:
Stonith operation 12/124:190:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer
expired (-62)
Oct 23 09:45:04 zarafa01 crmd[29263]:   notice: tengine_stonith_callback:
Stonith operation 12 for zarafa02 failed (Timer expired): aborting
transition.
Oct 23 09:45:04 zarafa01 crmd[29263]:   notice: tengine_stonith_notify:
Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for
zarafa01: Timer expired (ref=de24f595-81e3-49f5-8886-07c8c1b22ec7) by
client crmd.29263
Oct 23 09:45:04 zarafa01 crmd[29263]:   notice: run_graph: Transition 190
(Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown):
Stopped
Oct 23 09:45:04 zarafa01 crmd[29263]:   notice: too_many_st_failures: Too
many failures to fence zarafa02 (11), giving up
Oct 23 09:45:08 zarafa01 rsyslogd-2177: imuxsock lost 178501 messages from
pid 1927 due to rate-limiting


node zarafa01\
        attributes standby="off"
node zarafa02 \
        attributes standby="off"
primitive apache ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="60s" \
        op start interval="0" timeout="40s" \
        op stop interval="0" timeout="60s"
primitive drbd_mysql ocf:linbit:drbd \
        params drbd_resource="mysql" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s"
primitive drbd_zarafa ocf:linbit:drbd \
        params drbd_resource="zarafa" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="240" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s"
primitive mysql_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/data/mysql" fstype="ext4"
options="noatime" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="30s" timeout="40s"
primitive mysql_ip ocf:heartbeat:IPaddr2 \
        params ip="0.0.0.0" iflabel="MYSQL" cidr_netmask="20" nic="eth0" \
        op monitor interval="30s"
primitive mysqld lsb:mysqld \
        op monitor interval="10" timeout="30" \
        op start interval="0" timeout="500" \
        op stop interval="0" timeout="500"
primitive stonith-zarafa01 stonith:fence_virsh \
        params pcmk_host_list="zarafa01" pcmk_host_check="static-list"
action="reboot" ipaddr="host01" secure="true" login="root"
identity_file="/root/.ssh/id_rsa" \
        op monitor interval="300s" \
        op start interval="0" timeout="60s" \
        meta failure-timeout="180s"
primitive stonith-zarafa02 stonith:fence_virsh \
        params pcmk_host_list="zarafa02" pcmk_host_check="static-list"
action="reboot" ipaddr="host02" secure="true" delay="5" login="root"
identity_file="/root/.ssh/id_rsa" \
        op monitor interval="300s" \
        op start interval="0" timeout="60s" \
        meta failure-timeout="180s"
primitive zarafa-dagent lsb:zarafa-dagent \
        op monitor interval="30" timeout="30" \
        meta target-role="Started"
primitive zarafa-gateway lsb:zarafa-gateway \
        op monitor interval="30" timeout="30"
primitive zarafa-ical lsb:zarafa-ical \
        op monitor interval="30" timeout="30"
primitive zarafa-indexer lsb:zarafa-indexer \
        op monitor interval="60" timeout="60" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120"
primitive zarafa-licensed lsb:zarafa-licensed \
        op monitor interval="30" timeout="30"
primitive zarafa-monitor lsb:zarafa-monitor \
        op monitor interval="30" timeout="30"
primitive zarafa-server lsb:zarafa-server \
        op monitor interval="30" timeout="90" \
        meta target-role="Started"
primitive zarafa-spooler lsb:zarafa-spooler \
        op monitor interval="30" timeout="30"
primitive zarafa_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/data/zarafa" fstype="ext4" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="30s" timeout="40s" \
        meta target-role="Started"
primitive zarafa_ip ocf:heartbeat:IPaddr2 \
        params ip="0.0.0.1" iflabel="ZARAFA" cidr_netmask="20" nic="eth0" \
        op monitor interval="30s" \
        meta target-role="Started"
group mysql mysql_fs mysql_ip mysqld \
        meta target-role="Started"
group zarafa zarafa_fs zarafa_ip zarafa-server zarafa-spooler zarafa-dagent
zarafa-licensed zarafa-monitor zarafa-gateway zarafa-ical zarafa-indexer \
        meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
ms ms_drbd_zarafa drbd_zarafa \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
clone apache_clone apache
location cli-prefer-mysql mysql \
        rule $id="cli-prefer-rule-mysql" inf: #uname eq zarafa01
location drbd-fence-by-handler-mysql-ms_drbd_mysql ms_drbd_mysql \
        rule $id="drbd-fence-by-handler-mysql-rule-ms_drbd_mysql"
$role="Master" -inf: #uname ne zarafa01
location drbd-fence-by-handler-zarafa-ms_drbd_zarafa ms_drbd_zarafa \
        rule $id="drbd-fence-by-handler-zarafa-rule-ms_drbd_zarafa"
$role="Master" -inf: #uname ne zarafa01
location preferred_on_mysql mysql 100: zarafa01
location preferred_on_zarafa zarafa 100: zarafa01
location stonith-by-zarafa01 stonith-zarafa02 -inf: zarafa02
location stonith-by-zarafa02 stonith-zarafa01 -inf: zarafa01
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
colocation zarafa_on_drbd inf: zarafa ms_drbd_zarafa:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
order zarafa_after_drbd inf: ms_drbd_zarafa:promote zarafa:start
order zarafa_after_mysql inf: mysql:start zarafa:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.8-7.el6-394e906" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        cluster-recheck-interval="5min" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1382443560" \
        maintenance-mode="off"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200" \
        failure-timeout="10min" \
        migration-threshold="3"


crm status
Last updated: Wed Oct 23 10:51:51 2013
Last change: Wed Oct 23 10:12:17 2013 via cibadmin on zarafa01
Stack: classic openais (with plugin)
Current DC: zarafa01 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
21 Resources configured.


Online: [ zarafa01 zarafa02]

 Resource Group: mysql
     mysql_fs   (ocf::heartbeat:Filesystem):    Started zarafa01
     mysql_ip   (ocf::heartbeat:IPaddr2):       Started zarafa01
     mysqld     (lsb:mysqld):   Started zarafa01
 Master/Slave Set: ms_drbd_mysql [drbd_mysql]
     Masters: [ zarafa01 ]
     Stopped: [ drbd_mysql:1 ]
 Resource Group: zarafa
     zarafa_fs  (ocf::heartbeat:Filesystem):    Started zarafa01
     zarafa_ip  (ocf::heartbeat:IPaddr2):       Started zarafa01
     zarafa-server      (lsb:zarafa-server):    Started zarafa01
     zarafa-spooler     (lsb:zarafa-spooler):   Started zarafa01
     zarafa-dagent      (lsb:zarafa-dagent):    Started zarafa01
     zarafa-licensed    (lsb:zarafa-licensed):  Started zarafa01
     zarafa-monitor     (lsb:zarafa-monitor):   Started zarafa01
     zarafa-gateway     (lsb:zarafa-gateway):   Started zarafa01
     zarafa-ical        (lsb:zarafa-ical):      Started zarafa01
     zarafa-indexer     (lsb:zarafa-indexer):   Started zarafa01
 Master/Slave Set: ms_drbd_zarafa [drbd_zarafa]
     Masters: [ zarafa01 ]
     Stopped: [ drbd_zarafa:1 ]
 Clone Set: apache_clone [apache]
     Started: [ zarafa01 ]
     Stopped: [ apache:1 ]
 stonith-zarafa02   (stonith:fence_virsh):  Started zarafa01



*
*thanks
beo
*
*



*
**
*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131023/e2d545c8/attachment-0002.html>


More information about the Pacemaker mailing list