[Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
    Mistina Michal 
    Michal.Mistina at virte.sk
       
    Thu Jul 18 12:46:19 UTC 2013
    
    
  
Hi Andrew.
Thank you for a little insight. I tried to set higher timout limits within
fence_vmware_soap properties in cib database. After I had altered these
numbers I didn't experience SIGTERM or SIGKILL any more.
However automatic fencing was still not successfull.
I don't understand why "manual fencing" by using command "fence_vmware_soap"
is working though and automatic with same parameters isn't.
corosync.log attached further in the text shows there are some parsing
errors. I think this regards unusual characters used in the names of the
virtual machines which run on the ESX. This makes sense if unusual character
is used in the name of the fenced vmware machine. It isn't. The corosyng.log
shows names of other virtual machines on the ESX.
Is it safe to say the issue occured within fence_vmware_soap resource agent
because it cannot handle something, maybe names of the virtual machines? If
so, I will try to update that agent. I am using version
fence-agents-3.1.5-17.el6.x86_64.
Is there a chance that changing timeout limits will help the situation? I
have feeling timeouts doesn't solve anything. It times out because of
something else.
This is how the crm configuration looks now....
[root at pcmk1 ~]# crm configure show
node pcmk1
node pcmk2
primitive drbd_pg ocf:linbit:drbd \
        params drbd_resource="postgres" \
        op monitor interval="15" role="Master" \
        op monitor interval="16" role="Slave" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="120"
primitive pg_fs ocf:heartbeat:Filesystem \
        params device="/dev/vg_local-lv_pgsql/lv_pgsql"
directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime"
fstype="xfs" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="120"
primitive pg_lsb lsb:postgresql-9.2 \
        op monitor interval="30" timeout="60" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive pg_lvm ocf:heartbeat:LVM \
        params volgrpname="vg_local-lv_pgsql" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
primitive pg_vip ocf:heartbeat:IPaddr2 \
        params ip="x.x.x.x" iflabel="tstcapsvip" \
        op monitor interval="5"
primitive vm-fence-pcmk1 stonith:fence_vmware_soap \
        params ipaddr="x.x.x.x" login="administrator" passwd="password"
port="PCMK1" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120"
action="reboot" \
        op start interval="0" timeout="120"
primitive vm-fence-pcmk2 stonith:fence_vmware_soap \
        params ipaddr="x.x.x.x" login="administrator" passwd="password"
port="PCMK2" ssl="1" retry_on="10" shell_timeout="120" login_timeout="120"
action="reboot" \
        op start interval="0" timeout="120"
group PGServer pg_lvm pg_fs pg_lsb pg_vip \
        meta target-role="Started"
ms ms_drbd_pg drbd_pg \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1
location l-st-pcmk2 vm-fence-pcmk2 -inf: pcmk2
location master-prefer-node1 pg_vip 50: pcmk1
colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master
order ord_pg inf: ms_drbd_pg:promote PGServer:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="4" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        maintenance-mode="false"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
Command crm_verify -LV shows nothing.
[root at pcmk1 ~]# crm_verify -LV  
[root at pcmk1 ~]# crm_mon -1
============
Last updated: Thu Jul 18 14:23:15 2013
Last change: Thu Jul 18 14:20:54 2013 via crm_resource on pcmk1
Stack: openais
Current DC: pcmk2 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 4 expected votes
8 Resources configured.
============
Online: [ pcmk1 pcmk2 ]
 Resource Group: PGServer
     pg_lvm     (ocf::heartbeat:LVM):   Started pcmk1
     pg_fs      (ocf::heartbeat:Filesystem):    Started pcmk1
     pg_lsb     (lsb:postgresql-9.2):   Started pcmk1
     pg_vip     (ocf::heartbeat:IPaddr2):       Started pcmk1
 Master/Slave Set: ms_drbd_pg [drbd_pg]
     Masters: [ pcmk1 ]
     Slaves: [ pcmk2 ]
 vm-fence-pcmk1     (stonith:fence_vmware_soap):    Started pcmk2
 vm-fence-pcmk2     (stonith:fence_vmware_soap):    Started pcmk1
If I simulate split-brain by plugging out the cable from secondary server
pcmk2, /var/log/cluster/corosync.log on the primary server pcmk1 tell
this...
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
can_fence_host_with_device:      Refreshing port list for vm-fence-pcmk2
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [106.15],4222ac70-92c3-bddf-b524-24d848080cb2
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [107.25],42224003-b614-5eb2-f141-5437fc8319d8
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [107.29],4222719f-7bdc-84b2-4494-848a29c2bd5f
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI - WinXP with SP3 - MSDN
],4222238c-c927-3af1-f2e7-e0dd374d373b
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (31 32): ],4222238c-c927-3af1-f2e7-e0dd374d373b
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI WIN7 32-bit  -
MSDN],42223e4a-9541-2326-2a21-3b3532756b47
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 22): [105.233],42220acd-6e21-4380-9b81-89d86f14317d
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (9 17): [106.21],42223377-1443-a44c-1dc0-815c2542898e
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.29],4222394a-70f1-4612-6fcd-4525e13b0cc4
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI W2K8 R2 SP1 STD - MSDN
],4222dc65-6752-b1b4-c0f7-38c94cd5609a
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (30 31): ],4222dc65-6752-b1b4-c0f7-38c94cd5609a
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.52],4222aa80-0fe6-66c4-8d11-fea5f547b566
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [106.14],422249fc-a902-ba5c-deb0-e6db6198b984
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (18 25): [106.2],4222851c-1a9d-021a-4e16-9f8adc5bcc42
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.28],422235ab-83c4-c0b7-812b-bc5b7019aff7
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [106.26],4222bbff-48eb-d60c-0347-430b8d72baa2
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [107.27],4222da62-3c55-37f8-f6b8-239657892914
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI WIN7 64-bit - MSDN
],4222289e-0bd2-4280-c0f4-548fd42e7eab
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (26 27): ],4222289e-0bd2-4280-c0f4-548fd42e7eab
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (17 26): [105.242],42228b51-4ef6-f9b8-b64a-882d68023074
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (20 29): [105.230],42223dcd-22c1-a0f7-c629-5c4489e2c55d
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ W2K3 R2 ENT 32-bit ENG
],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (25 26): ],4233c1c8-e0f9-26f3-b854-6376ec6b1d1c
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (9 17): [106.20],422285ba-6a31-0832-1b38-a910031cd057
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [106.27],4222d166-5647-79a3-d9d8-f90650b6188b
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (21 30): [105.231],4222308c-41c7-02e9-3b20-c6df71838db9
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (25 28): !!! [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (29 38): [105.235],422283ac-c5d9-4bf1-96eb-a57d8d18c118
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.13],42222137-0d67-ac9b-e3b6-11fb6d2c33e0
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (17 26): [105.241],4222a40f-d91a-0e4f-2292-ef92c4836bb5
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (17 26): [105.243],42222a9a-7440-6d19-b654-42c08a2abd69
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI W2K8 R2 SP1 ENT - MSDN
],42227507-c4fd-c5aa-b7d7-4ececd284f84
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (30 31): ],42227507-c4fd-c5aa-b7d7-4ececd284f84
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (0 1): [ MEDI_gw_chckpnt
],4222f42e-58c6-dc59-2a00-10041ad5ac08
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (18 19): ],4222f42e-58c6-dc59-2a00-10041ad5ac08
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 22): [105.234],422295e3-644e-8b51-a373-e7f166b2fd5d
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 22): [105.232],42228f9d-615f-1c3b-2158-d3ad08d40357
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (17 26): [105.240],4222b273-68e7-379d-b874-6a47211e9449
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [107.28],4222cbc8-565d-eee1-4430-555b059663d0
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 22): [105.236],4222115e-789a-66dd-95e9-786ec0d84ec0
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (13 21): [107.26],4222fb16-fadc-9031-8e3d-110225505a0f
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.12],42226bf9-8e78-9356-773c-ecde31cf2fa2
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:  warning: parse_host_line:
Could not parse (12 20): [106.51],4222ae99-f1d9-9811-d72b-10e875c58f56
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
can_fence_host_with_device:      vm-fence-pcmk2 can not fence pcmk2:
dynamic-list
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info: stonith_command:
Processed st_query from pcmk1: rc=0
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:    error: remote_op_done:
Operation reboot of pcmk2 by <no-one> for
pcmk1[7496e5e6-4ab4-4028-b44d-c34c52a3fd04]: Operation timed out
Jul 18 14:31:00 [1498] pcmk1       crmd:     info: tengine_stonith_callback:
StonithOp <remote-op state="0" st_target="pcmk2" st_op="reboot" />
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: tengine_stonith_callback:
Stonith operation 4 for pcmk2 failed (Operation timed out): aborting
transition.
Jul 18 14:31:00 [1498] pcmk1       crmd:     info: abort_transition_graph:
tengine_stonith_callback:454 - Triggered transition abort (complete=0) :
Stonith failed
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: tengine_stonith_notify:
Peer pcmk2 was not terminated (reboot) by <anyone> for pcmk1: Operation
timed out (ref=ca100580-8e00-49d4-b895-c538139a28dd)
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: run_graph:       ====
Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=4, Incomplete=5,
Source=/var/lib/pengine/pe-warn-34.bz2): Stopped
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: unpack_config:   On loss
of CCM Quorum: Ignore
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: pe_fence_node:   Node
pcmk2 will be fenced because it is un-expectedly down
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: determine_online_status:
Node pcmk2 is unclean
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline)
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Marking
node pcmk2 unclean
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
drbd_pg:1_stop_0 on pcmk2 is unrunnable (offline)
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Marking
node pcmk2 unclean
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Action
vm-fence-pcmk1_stop_0 on pcmk2 is unrunnable (offline)
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: custom_action:   Marking
node pcmk2 unclean
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: stage6:  Scheduling Node
pcmk2 for STONITH
Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: LogActions:      Stop
drbd_pg:1       (pcmk2)
Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: LogActions:      Stop
vm-fence-pcmk1      (pcmk2)
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jul 18 14:31:00 [1498] pcmk1       crmd:     info: do_te_invoke:
Processing graph 3 (ref=pe_calc-dc-1374150660-46) derived from
/var/lib/pengine/pe-warn-35.bz2
Jul 18 14:31:00 [1498] pcmk1       crmd:     info: te_rsc_command:
Initiating action 63: notify drbd_pg:0_pre_notify_stop_0 on pcmk1 (local)
Jul 18 14:31:00 pcmk1 lrmd: [1495]: info: rsc:drbd_pg:0:28: notify
Jul 18 14:31:00 [1498] pcmk1       crmd:   notice: te_fence_node:
Executing reboot fencing operation (53) on pcmk2 (timeout=60000)
Jul 18 14:31:00 [1494] pcmk1 stonith-ng:     info:
initiate_remote_stonith_op:      Initiating remote operation reboot for
pcmk2: d69db4e3-7d3b-4bee-9bd5-aa7afb05c358
Jul 18 14:31:00 [1497] pcmk1    pengine:  warning: process_pe_message:
Transition 3: WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/pengine/pe-warn-35.bz2
Jul 18 14:31:00 [1497] pcmk1    pengine:   notice: process_pe_message:
Configuration WARNINGs found during PE processing.  Please run "crm_verify
-L" to identify issues.
Jul 18 14:31:01 [1498] pcmk1       crmd:     info: process_lrm_event:
LRM operation drbd_pg:0_notify_0 (call=28, rc=0, cib-update=0,
confirmed=true) ok
Regards,
Michal Mistina
-----Original Message-----
From: Andrew Beekhof [mailto:andrew at beekhof.net] 
Sent: Tuesday, July 16, 2013 5:23 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
On 15/07/2013, at 8:56 PM, Mistina Michal <Michal.Mistina at virte.sk> wrote:
> Hi Andrew.
> 
> Here is the ommited /var/log/messages with stonigh-ng sections.
> 
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:   notice: stonith_device_action:
> Device vm-fence-pcmk2 not found
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
Processed
> st_execute from lrmd: rc=-12
> Jul 15 09:53:38 PCMK1 crmd[1542]:     info: process_lrm_event: LRM
operation
> vm-fence-pcmk2_monitor_0 (call=11, rc=7, cib-update=21, 
> confirmed=true) not running Jul 15 09:53:38 PCMK1 lrmd: [1539]: info: 
> rsc:vm-fence-pcmk2:12: start
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_device_register:
> Added 'vm-fence-pcmk2' to the device list (1 active devices)
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
Processed
> st_device_register from lrmd: rc=0
> Jul 15 09:53:38 PCMK1 stonith-ng[1538]:     info: stonith_command:
Processed
> st_execute from lrmd: rc=-1
> Jul 15 09:54:13 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process 
> (PID
> 3332) timed out (try 1).  Killing with signal SIGTERM (15).
you took too long, go away
> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: vm-fence-pcmk2:start process 
> (PID
> 3332) timed out (try 2).  Killing with signal SIGKILL (9).
seriously go away
> Jul 15 09:54:18 PCMK1 lrmd: [1539]: WARN: operation start[12] on
> stonith::fence_vmware_soap::vm-fence-pcmk2 for client 1542, its
parameters:
> passwd=[password] shell_timeout=[20] ssl=[1] login=[administrator] 
> action=[reboot] crm_feature_set=[3.0.6] retry_on=[10] ipaddr=[x.x.x.x] 
> port=[T1-PCMK2] login_timeout=[15] CRM_meta_timeout=[20000] : pid 
> [3332] timed out
whatever that agent is doing, its taking to long or you've not given it long
enough
> Jul 15 09:54:18 PCMK1 crmd[1542]:    error: process_lrm_event: LRM
operation
> vm-fence-pcmk2_start_0 (12) Timed Out (timeout=20000ms)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
> relayed from pcmk2
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: fail-count-vm-fence-pcmk2 (INFINITY)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
> update 24: fail-count-vm-fence-pcmk2=INFINITY
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_ais_dispatch: Update
> relayed from pcmk2
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: last-failure-vm-fence-pcmk2 (1373874858)
> Jul 15 09:54:18 PCMK1 attrd[1540]:   notice: attrd_perform_update: Sent
> update 27: last-failure-vm-fence-pcmk2=1373874858
> Jul 15 09:54:21 PCMK1 lrmd: [1539]: info: rsc:vm-fence-pcmk2:13: stop
> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_device_remove:
> Removed 'vm-fence-pcmk2' from the device list (0 active devices)
> Jul 15 09:54:21 PCMK1 stonith-ng[1538]:     info: stonith_command:
Processed
> st_device_remove from lrmd: rc=0
> Jul 15 09:54:21 PCMK1 crmd[1542]:     info: process_lrm_event: LRM
operation
> vm-fence-pcmk2_stop_0 (call=13, rc=0, cib-update=23, confirmed=true) 
> ok
> 
> What does this output mean?
> 
> Best regards,
> Michal Mistina
> 
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net]
> Sent: Monday, July 15, 2013 3:06 AM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] RHEL 6.3 + fence_vmware_soap + esx 5.1
> 
> 
> On 13/07/2013, at 10:05 PM, Mistina Michal <Michal.Mistina at virte.sk>
wrote:
> 
>> Hi,
>> Does somebody know how to set up fence_vmware_soap correctly so that 
>> it
> will start fencing vmware machine in the esx 5.1?
>> 
>> My problem is the fence_vmware_soap resource agent for stonith timed out.
> Don't know why.
> 
> Nothing in the stonith-ng logs?
> 
>> 
>> [root at pcmk1 ~]# crm_verify -L -V
>> warning: unpack_rsc_op:        Processing failed op
> vm-fence-pcmk2_last_failure_0 on pcmk1: unknown exec error (-2)
>> warning: unpack_rsc_op:        Processing failed op
> vm-fence-pcmk1_last_failure_0 on pcmk2: unknown exec error (-2)
>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk2 away from
> pcmk1 after 1000000 failures (max=1000000)
>> warning: common_apply_stickiness:      Forcing vm-fence-pcmk1 away from
> pcmk2 after 1000000 failures (max=1000000)
>> 
>> I have 2 node cluster. If I tried to manually reboot vmware machine 
>> by
> calling fence_vmware_soap it worked.
>> [root at pcmk1 ~]# fence_vmware_soap -a x.x.x.x -l administrator -p 
>> password -n "pcmk2" -o reboot -z
>> 
>> My settings are.
>> [root at pcmk1 ~]# stonith_admin -M -a fence_vmware_soap <resource-agent 
>> name="fence_vmware_soap" shortdesc="Fence agent for VMWare over SOAP 
>> API">  <longdesc>fence_vmware_soap is an I/O Fencing agent which can 
>> be used
> with the virtual machines managed by VMWare products that have SOAP 
> API v4.1+.
>> .P
>> Name of virtual machine (-n / port) has to be used in inventory path
> format (e.g. /datacenter/vm/Discovered virtual machine/myMachine). In 
> the cases when name of yours VM is unique you can use it instead. 
> Alternatively you can always use UUID (-U / uuid) to access virtual 
> machine.</longdesc>
>>  <vendor-url>http://www.vmware.com</vendor-url>
>>  <parameters>
>>    <parameter name="action" unique="0" required="1">
>>      <getopt mixed="-o, --action=<action>"/>
>>      <content type="string" default="reboot"/>
>>      <shortdesc lang="en">Fencing Action</shortdesc>
>>    </parameter>
>>    <parameter name="ipaddr" unique="0" required="1">
>>      <getopt mixed="-a, --ip=<ip>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">IP Address or Hostname</shortdesc>
>>    </parameter>
>>    <parameter name="login" unique="0" required="1">
>>      <getopt mixed="-l, --username=<name>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Login Name</shortdesc>
>>    </parameter>
>>    <parameter name="passwd" unique="0" required="0">
>>      <getopt mixed="-p, --password=<password>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Login password or passphrase</shortdesc>
>>    </parameter>
>>    <parameter name="passwd_script" unique="0" required="0">
>>      <getopt mixed="-S, --password-script=<script>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Script to retrieve password</shortdesc>
>>    </parameter>
>>    <parameter name="ssl" unique="0" required="0">
>>      <getopt mixed="-z, --ssl"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">SSL connection</shortdesc>
>>    </parameter>
>>    <parameter name="port" unique="0" required="0">
>>      <getopt mixed="-n, --plug=<id>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Physical plug number or name of virtual
> machine</shortdesc>
>>    </parameter>
>>    <parameter name="uuid" unique="0" required="0">
>>      <getopt mixed="-U, --uuid"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">The UUID of the virtual machine to
> fence.</shortdesc>
>>    </parameter>
>>    <parameter name="ipport" unique="0" required="0">
>>      <getopt mixed="-u, --ipport=<port>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">TCP port to use for connection with
> device</shortdesc>
>>    </parameter>
>>    <parameter name="verbose" unique="0" required="0">
>>      <getopt mixed="-v, --verbose"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Verbose mode</shortdesc>
>>    </parameter>
>>    <parameter name="debug" unique="0" required="0">
>>      <getopt mixed="-D, --debug-file=<debugfile>"/>
>>      <content type="string"/>
>>      <shortdesc lang="en">Write debug information to given
> file</shortdesc>
>>    </parameter>
>>    <parameter name="version" unique="0" required="0">
>>      <getopt mixed="-V, --version"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Display version information and
> exit</shortdesc>
>>    </parameter>
>>    <parameter name="help" unique="0" required="0">
>>      <getopt mixed="-h, --help"/>
>>      <content type="boolean"/>
>>      <shortdesc lang="en">Display help and exit</shortdesc>
>>    </parameter>
>>    <parameter name="separator" unique="0" required="0">
>>      <getopt mixed="-C, --separator=<char>"/>
>>      <content type="string" default=","/>
>>      <shortdesc lang="en">Separator for CSV created by operation
> list</shortdesc>
>>    </parameter>
>>    <parameter name="power_timeout" unique="0" required="0">
>>      <getopt mixed="--power-timeout"/>
>>      <content type="string" default="20"/>
>>      <shortdesc lang="en">Test X seconds for status change after
> ON/OFF</shortdesc>
>>    </parameter>
>>    <parameter name="shell_timeout" unique="0" required="0">
>>      <getopt mixed="--shell-timeout"/>
>>      <content type="string" default="3"/>
>>      <shortdesc lang="en">Wait X seconds for cmd prompt after issuing
> command</shortdesc>
>>    </parameter>
>>    <parameter name="login_timeout" unique="0" required="0">
>>      <getopt mixed="--login-timeout"/>
>>      <content type="string" default="5"/>
>>      <shortdesc lang="en">Wait X seconds for cmd prompt after
> login</shortdesc>
>>    </parameter>
>>    <parameter name="power_wait" unique="0" required="0">
>>      <getopt mixed="--power-wait"/>
>>      <content type="string" default="0"/>
>>      <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
>>    </parameter>
>>    <parameter name="delay" unique="0" required="0">
>>      <getopt mixed="--delay"/>
>>      <content type="string" default="0"/>
>>      <shortdesc lang="en">Wait X seconds before fencing is
> started</shortdesc>
>>    </parameter>
>>    <parameter name="retry_on" unique="0" required="0">
>>      <getopt mixed="--retry-on"/>
>>      <content type="string" default="1"/>
>>      <shortdesc lang="en">Count of attempts to retry power on</shortdesc>
>>    </parameter>
>>  </parameters>
>>  <actions>
>>    <action name="on"/>
>>    <action name="off"/>
>>    <action name="reboot"/>
>>    <action name="status"/>
>>    <action name="list"/>
>>    <action name="monitor"/>
>>    <action name="metadata"/>
>>    <action name="stop" timeout="20s"/>
>>    <action name="start" timeout="20s"/>  </actions> </resource-agent>
>> 
>> [root at pcmk1 ~]# crm configure show
>> node pcmk1
>> node pcmk2
>> primitive drbd_pg ocf:linbit:drbd \
>>        params drbd_resource="postgres" \
>>        op monitor interval="15" role="Master" \
>>        op monitor interval="16" role="Slave" \
>>        op start interval="0" timeout="240" \
>>        op stop interval="0" timeout="120"
>> primitive pg_fs ocf:heartbeat:Filesystem \
>>        params device="/dev/vg_local-lv_pgsql/lv_pgsql"
> directory="/var/lib/pgsql/9.2/data" options="noatime,nodiratime"
> fstype="xfs" \
>>        op start interval="0" timeout="60" \
>>        op stop interval="0" timeout="120"
>> primitive pg_lsb lsb:postgresql-9.2 \
>>        op monitor interval="30" timeout="60" \
>>        op start interval="0" timeout="60" \
>>        op stop interval="0" timeout="60"
>> primitive pg_lvm ocf:heartbeat:LVM \
>>        params volgrpname="vg_local-lv_pgsql" \
>>        op start interval="0" timeout="30" \
>>        op stop interval="0" timeout="30"
>> primitive pg_vip ocf:heartbeat:IPaddr2 \
>>        params ip="x.x.x.x" iflabel="pcmkvip" \
>>        op monitor interval="5"
>> primitive vm-fence-pcmk1 stonith:fence_vmware_soap \
>>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="pcmk1" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
> action="reboot"
>> primitive vm-fence-pcmk2 stonith:fence_vmware_soap \
>>        params ipaddr="x.x.x.x" login="administrator" passwd="password"
> port="pcmk2" ssl="1" retry_on="10" shell_timeout="20" login_timeout="15"
> action="reboot"
>> group PGServer pg_lvm pg_fs pg_lsb pg_vip ms ms_drbd_pg drbd_pg \
>>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>> location l-st-pcmk1 vm-fence-pcmk1 -inf: pcmk1 location l-st-pcmk2
>> vm-fence-pcmk2 -inf: pcmk2 location master-prefer-node1 pg_vip 50: 
>> pcmk1 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master order 
>> ord_pg inf: ms_drbd_pg:promote PGServer:start property 
>> $id="cib-bootstrap-options" \
>>        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
> \
>>        cluster-infrastructure="openais" \
>>        expected-quorum-votes="4" \
>>        stonith-enabled="true" \
>>        no-quorum-policy="ignore" \
>>        maintenance-mode="false"
>> rsc_defaults $id="rsc-options" \
>>        resource-stickiness="100"
>> 
>> Am I doing something wrong?
>> 
>> Best regards,
>> Michal Mistina
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3057 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130718/b2694f9a/attachment-0004.p7s>
    
    
More information about the Pacemaker
mailing list