[ClusterLabs] Fence agent executing thousands of API calls per hour

Wed Aug 1 17:08:15 EDT 2018

On Wed, 2018-08-01 at 14:47 -0600, Casey & Gina wrote:
> Actually, is it even necessary at all?  Based on my other E-mail to
> the list (Fence agent ends up stopped with no clear reason why), it
> seems that sometimes the monitor fails with an "unknown error",
> resulting in a cluster that won't fail over due to inability to
> fence.  I tried looking at the fence agent to determine which API 

A failed monitor (or start) shouldn't prevent the cluster from using
the device for fencing. If actual fence actions are failing, that
should be seen separately from the fence resource failures.

> calls might be being executed but I can't figure that out myself...in
> any case I don't see how this is offering any real value...happy to
> learn how I might be wrong, though...
> 
> > On 2018-08-01, at 2:26 PM, Casey & Gina <caseyandgina at icloud.com>
> > wrote:
> > 
> > How is the interval adjusted?  Based on an example I found online,
> > I thought `pcs resource op monitor interval=15m vmware_fence`
> > should work, but after executing that `pcs config` still shows a
> > monitor interval of 60s.

A resource can have more than one monitor, so that command by itself
just adds a second monitor. You have to delete the original one
separately with pcs resource op remove.

> > 
> > Thank you,
> > -- 
> > Casey
> > 
> > > On 2018-07-31, at 9:11 AM, Casey Allen Shobe <caseyandgina at icloud
> > > .com> wrote:
> > > 
> > > Aha, thank you!  I missed the blatantly obvious.  I will discuss
> > > with my colleague and likely use a longer interval.
> > > 
> > > > On Jul 30, 2018, at 11:25 PM, Klaus Wenninger <kwenning at redhat.
> > > > com> wrote:
> > > > 
> > > > > On 07/31/2018 01:47 AM, Casey & Gina wrote:
> > > > > I've set up a number of clusters in a VMware environment, and
> > > > > am using the fence_vmware_rest agent for fencing (from fence-
> > > > > agents 4.2.1), as follows:
> > > > > 
> > > > > Stonith Devices:
> > > > > Resource: vmware_fence (class=stonith type=fence_vmware_rest)
> > > > > Attributes: ip=<host> username=<username> password=<password>
> > > > > ssl_insecure=1 pcmk_host_check=static-list pcmk_host_list=b-
> > > > > gp2-dbpg35-1;b-gp2-dbpg35-2;b-gp2-dbpg35-3
> > > > > Operations: monitor interval=60s (vmware_fence-monitor-
> > > > > interval-60s)
> > > > > 
> > > > > We are using a dedicated service account on the VMware side
> > > > > for pacemaker.
> > > > > 
> > > > > The clusters are running fine, and no failover events have
> > > > > happened recently.  However, our VMware admin came to me
> > > > > asking why the pacemaker service account is logging in and
> > > > > executing API calls very frequently (for an environment where
> > > > > there are 3 clusters, 9 nodes total, he is seeing ~1400 API 
> > > > 
> > > > Haven't looked at the internals of fence_vmware_rest but
> > > > sounds like 2-3 API-calls per monitoring (or around 10 API-
> > > > calls
> > > > if it is just one monitored instance per cluster - what the
> > > > config
> > > > snippet from above looks like).
> > > > Have you tried to increase the 60s monitoring interval?
> > > > 
> > > > Klaus
> > > > > calls per hour as this user).  I do not see anything logged
> > > > > in corosync.log about why this would be, and my limited
> > > > > understanding was that the fence agent would only be calling
> > > > > the power off and reboot API's when pacemaker couldn't get a
> > > > > response from a node in the cluster.  I thought that using a
> > > > > static-list for the host_check would prevent any API calls
> > > > > for getting a list of hosts, although even if that were going
> > > > > on I would think it would be a rare event.  His concern is
> > > > > that this amount of load on the vmware hosts isn't
> > > > > sustainable.
> > > > > 
> > > > > Unfortunately the logging available from vmWare doesn't give
> > > > > a lot of information - it just says the number of API calls,
> > > > > not which API(s) were called.
> > > > > 
> > > > > Any ideas what might be going on?  Is there a way to get
> > > > > increased logging for the fence agent?
> > > > > 
> > > > > Thanks in advance,
> > > 
> > > _______________________________________________
> > > Users mailing list: Users at clusterlabs.org
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra
> > > tch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>