[ClusterLabs] Stonith stops after vSphere restart

Thu Feb 22 06:24:49 EST 2018

Hi,

On Thu, Feb 22, 2018 at 11:58 AM, <jota at disroot.org> wrote:

>
> Hi,
>
> I have a 2 node pacemaker cluster configured with the fence agent
> vmware_soap.
> Everything works fine until the vCenter is restarted. After that, stonith
> fails and stop.
>

This is expected as we run 'monitor' action to find out if fence device is
working. I assume that it is not responding when vCenter is restarting. If
your fencing device fails then manual intervention makes sense as you have
to have fencing working  in order to prevent data corruption.

m,

>
> [root at node1 ~]# pcs status
> Cluster name: psqltest
> Stack: corosync
> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
> quorum
> Last updated: Thu Feb 22 11:30:22 2018
> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>
> 2 nodes configured
> 6 resources configured
>
> Online: [ node1 node2 ]
>
> Full list of resources:
>
> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: pgsqltest
> psqltestfs (ocf::heartbeat:Filesystem): Started node1
> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
> postgresql-94 (ocf::heartbeat:pgsql): Started node1
> vmware_soap (stonith:fence_vmware_soap): Stopped
>
> Failed Actions:
> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
> exitreason='none',
> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
>
> [root at node1 ~]# pcs stonith show --full
> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1
> action= pcmk_list_timeout=120s pcmk_monitor_timeout=120s
> pcmk_status_timeout=120s
> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>
>
> I need to manually perform a "resource cleanup vmware_soap" to put it
> online again.
> Is there any way to do this automatically?.
> Is it possible to detect vSphere online again and enable stonith?.
>
> Thanks.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180222/86f7bb50/attachment-0002.html>