[ClusterLabs] Stonith stops after vSphere restart

Thu Feb 22 06:40:40 EST 2018

Thanks for the responses.

So, if I understand, this is the right behaviour and it does not affect to the stonith mechanism.

If I remember correctly, the fault status persists for hours until I fix it manually.
Is there any way to modify the expiry time to clean itself?.

22 de febrero de 2018 12:28, "Andrei Borzenkov" <arvidjaar at gmail.com> escribió:

> Stonith resource state should have no impact on actual stonith
> operation. It only reflects whether monitor was successful or not and
> serves as warning to administrator that something may be wrong. It
> should automatically clear itself after failure-timeout has expired.
> 
> On Thu, Feb 22, 2018 at 1:58 PM, <jota at disroot.org> wrote:
> 
>> Hi,
>> 
>> I have a 2 node pacemaker cluster configured with the fence agent
>> vmware_soap.
>> Everything works fine until the vCenter is restarted. After that, stonith
>> fails and stop.
>> 
>> [root at node1 ~]# pcs status
>> Cluster name: psqltest
>> Stack: corosync
>> Current DC: node2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with
>> quorum
>> Last updated: Thu Feb 22 11:30:22 2018
>> Last change: Mon Feb 19 09:28:37 2018 by root via crm_resource on node1
>> 
>> 2 nodes configured
>> 6 resources configured
>> 
>> Online: [ node1 node2 ]
>> 
>> Full list of resources:
>> 
>> Master/Slave Set: ms_drbd_psqltest [drbd_psqltest]
>> Masters: [ node1 ]
>> Slaves: [ node2 ]
>> Resource Group: pgsqltest
>> psqltestfs (ocf::heartbeat:Filesystem): Started node1
>> psqltest_vip (ocf::heartbeat:IPaddr2): Started node1
>> postgresql-94 (ocf::heartbeat:pgsql): Started node1
>> vmware_soap (stonith:fence_vmware_soap): Stopped
>> 
>> Failed Actions:
>> * vmware_soap_start_0 on node1 'unknown error' (1): call=38, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:46 2018', queued=0ms, exec=5374ms
>> * vmware_soap_start_0 on node2 'unknown error' (1): call=56, status=Error,
>> exitreason='none',
>> last-rc-change='Thu Feb 22 10:55:39 2018', queued=0ms, exec=5479ms
>> 
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>> 
>> [root at node1 ~]# pcs stonith show --full
>> Resource: vmware_soap (class=stonith type=fence_vmware_soap)
>> Attributes: inet4_only=1 ipaddr=192.168.1.1 ipport=443 login=MYDOMAIN\User
>> passwd=mypass pcmk_host_list=node1,node2 power_wait=3 ssl_insecure=1 action=
>> pcmk_list_timeout=120s pcmk_monitor_timeout=120s pcmk_status_timeout=120s
>> Operations: monitor interval=60s (vmware_soap-monitor-interval-60s)
>> 
>> I need to manually perform a "resource cleanup vmware_soap" to put it online
>> again.
>> Is there any way to do this automatically?.
>> Is it possible to detect vSphere online again and enable stonith?.
>> 
>> Thanks.
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org