[ClusterLabs] connection timed out fence_virsh monitor stonith
luke.camilleri at zylacomputing.com
Wed Feb 26 04:38:13 EST 2020
Hi there, first of all thank you both for your suggestions and observations and apologies for my late reply.
I will check the logs on both hosts (although only one of them seems to be the issue) and will revert with any findings.
Just to confirm the error message for the monitor operation:
It seems that host zc-mail-2.zylacloud.com has a connection timeout to monitor the resource fence_zc-mail-1_virsh right?
My question here is, what is the monitor operation doing to confirm that the monitor operation is successful?
Is it doing the same operation as specified in the stonith resource and expecting a particular exit code?
Thanks once again
From: Dan Swartzendruber <dswartz at druber.com<mailto:Dan%20Swartzendruber%20%3cdswartz at druber.com%3e>>
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org<mailto:Cluster%20Labs%20-%20All%20topics%20related%20to%20open-source%20clustering%20welcomed%20%3cusers at clusterlabs.org%3e>>
Cc: Luke Camilleri <luke.camilleri at zylacomputing.com<mailto:Luke%20Camilleri%20%3cluke.camilleri at zylacomputing.com%3e>>
Subject: Re: [ClusterLabs] connection timed out fence_virsh monitor stonith
Date: Mon, 24 Feb 2020 12:24:16 -0500
On 2020-02-24 12:17, Strahil Nikolov wrote:
On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri
<luke.camilleri at zylacomputing.com<mailto:luke.camilleri at zylacomputing.com>> wrote:
Hello users, I would like to ask for assistance on the below setup
please, mainly on the monitor fence timeout:
I notice that the issue happens at 00:00 on both days .
Have you checked for a backup or other cron job that is 'overloading'
the virtualization host ?
This is a very good point. I had a similar problem with a vsphere
cluster. Two hyper-converged storage appliances. I used the
fence-vmware-rest (or soap) stonith agent to fence the storage apps.
Worked just fine. Until the vcenter server appliance got busy doing
something or other. Next thing I know, I'm getting stonith agent
timeouts. I ended up switching to fence_scsi. Not sure there is a good
answer. I saw on a vmware forum a recommendation to increase the
stonith timeout, but the recommended timeout was close to a minute,
which is enough to be a problem for the VMs in that cluster...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users