[ClusterLabs] connection timed out fence_virsh monitor stonith

Luke Camilleri luke.camilleri at zylacomputing.com
Wed Feb 26 04:38:13 EST 2020

Hi there, first of all thank you both for your suggestions and observations and apologies for my late reply.

I will check the logs on both hosts (although only one of them seems to be the issue) and will revert with any findings.

Just to confirm the error message for the monitor operation:

It seems that host zc-mail-2.zylacloud.com has a connection timeout to monitor the resource fence_zc-mail-1_virsh right?

My question here is, what is the monitor operation doing to confirm that the monitor operation is successful?

Is it doing the same operation as specified in the stonith resource and expecting a particular exit code?

Thanks once again

-----Original Message-----
From: Dan Swartzendruber <dswartz at druber.com<mailto:Dan%20Swartzendruber%20%3cdswartz at druber.com%3e>>
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org<mailto:Cluster%20Labs%20-%20All%20topics%20related%20to%20open-source%20clustering%20welcomed%20%3cusers at clusterlabs.org%3e>>
Cc: Luke Camilleri <luke.camilleri at zylacomputing.com<mailto:Luke%20Camilleri%20%3cluke.camilleri at zylacomputing.com%3e>>
Subject: Re: [ClusterLabs] connection timed out fence_virsh monitor stonith
Date: Mon, 24 Feb 2020 12:24:16 -0500

On 2020-02-24 12:17, Strahil Nikolov wrote:

On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri

<luke.camilleri at zylacomputing.com<mailto:luke.camilleri at zylacomputing.com>> wrote:

Hello users, I would like to ask for assistance on the below setup

please, mainly on the monitor fence timeout:

I notice that the issue happens at 00:00 on both days .

Have you checked  for a backup or other cron job that is 'overloading'

the virtualization host ?

This is a very good point.  I had a similar problem with a vsphere

cluster.  Two hyper-converged storage appliances.  I used the

fence-vmware-rest (or soap) stonith agent to fence the storage apps.

Worked just fine.  Until the vcenter server appliance got busy doing

something or other.  Next thing I know, I'm getting stonith agent

timeouts.  I ended up switching to fence_scsi.  Not sure there is a good

answer.  I saw on a vmware forum a recommendation to increase the

stonith timeout, but the recommended timeout was close to a minute,

which is enough to be a problem for the VMs in that cluster...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20200226/dfbd0295/attachment-0001.html>

More information about the Users mailing list