[ClusterLabs] connection timed out fence_virsh monitor stonith

Strahil Nikolov hunter86_bg at yahoo.com
Wed Feb 26 10:44:24 EST 2020


On February 26, 2020 11:38:13 AM GMT+02:00, Luke Camilleri <luke.camilleri at zylacomputing.com> wrote:
>Hi there, first of all thank you both for your suggestions and
>observations and apologies for my late reply.
>
>I will check the logs on both hosts (although only one of them seems to
>be the issue) and will revert with any findings.
>
>Just to confirm the error message for the monitor operation:
>
>It seems that host zc-mail-2.zylacloud.com has a connection timeout to
>monitor the resource fence_zc-mail-1_virsh right?
>
>My question here is, what is the monitor operation doing to confirm
>that the monitor operation is successful?

As it is a virsh-based stonith, I expect  that it connects and runs a 'virsh list' or something like that.

>Is it doing the same operation as specified in the stonith resource and
>expecting a particular exit code?
>
>Thanks once again
>
>-----Original Message-----
>From: Dan Swartzendruber
><dswartz at druber.com<mailto:Dan%20Swartzendruber%20%3cdswartz at druber.com%3e>>
>To: Cluster Labs - All topics related to open-source clustering
>welcomed
><users at clusterlabs.org<mailto:Cluster%20Labs%20-%20All%20topics%20related%20to%20open-source%20clustering%20welcomed%20%3cusers at clusterlabs.org%3e>>
>Cc: Luke Camilleri
><luke.camilleri at zylacomputing.com<mailto:Luke%20Camilleri%20%3cluke.camilleri at zylacomputing.com%3e>>
>Subject: Re: [ClusterLabs] connection timed out fence_virsh monitor
>stonith
>Date: Mon, 24 Feb 2020 12:24:16 -0500
>
>
>On 2020-02-24 12:17, Strahil Nikolov wrote:
>
>On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri
>
><luke.camilleri at zylacomputing.com<mailto:luke.camilleri at zylacomputing.com>>
>wrote:
>
>Hello users, I would like to ask for assistance on the below setup
>
>please, mainly on the monitor fence timeout:
>
>
>I notice that the issue happens at 00:00 on both days .
>
>Have you checked  for a backup or other cron job that is 'overloading'
>
>the virtualization host ?
>
>
>This is a very good point.  I had a similar problem with a vsphere
>
>cluster.  Two hyper-converged storage appliances.  I used the
>
>fence-vmware-rest (or soap) stonith agent to fence the storage apps.
>
>Worked just fine.  Until the vcenter server appliance got busy doing
>
>something or other.  Next thing I know, I'm getting stonith agent
>
>timeouts.  I ended up switching to fence_scsi.  Not sure there is a
>good
>
>answer.  I saw on a vmware forum a recommendation to increase the
>
>stonith timeout, but the recommended timeout was close to a minute,
>
>which is enough to be a problem for the VMs in that cluster...

Best Regards,
Strahil Nikolov


More information about the Users mailing list