[ClusterLabs] Fence agent executing thousands of API calls per hour

Wed Aug 1 16:47:04 EDT 2018

Actually, is it even necessary at all?  Based on my other E-mail to the list (Fence agent ends up stopped with no clear reason why), it seems that sometimes the monitor fails with an "unknown error", resulting in a cluster that won't fail over due to inability to fence.  I tried looking at the fence agent to determine which API calls might be being executed but I can't figure that out myself...in any case I don't see how this is offering any real value...happy to learn how I might be wrong, though...

> On 2018-08-01, at 2:26 PM, Casey & Gina <caseyandgina at icloud.com> wrote:
> 
> How is the interval adjusted?  Based on an example I found online, I thought `pcs resource op monitor interval=15m vmware_fence` should work, but after executing that `pcs config` still shows a monitor interval of 60s.
> 
> Thank you,
> -- 
> Casey
> 
>> On 2018-07-31, at 9:11 AM, Casey Allen Shobe <caseyandgina at icloud.com> wrote:
>> 
>> Aha, thank you!  I missed the blatantly obvious.  I will discuss with my colleague and likely use a longer interval.
>> 
>>> On Jul 30, 2018, at 11:25 PM, Klaus Wenninger <kwenning at redhat.com> wrote:
>>> 
>>>> On 07/31/2018 01:47 AM, Casey & Gina wrote:
>>>> I've set up a number of clusters in a VMware environment, and am using the fence_vmware_rest agent for fencing (from fence-agents 4.2.1), as follows:
>>>> 
>>>> Stonith Devices:
>>>> Resource: vmware_fence (class=stonith type=fence_vmware_rest)
>>>> Attributes: ip=<host> username=<username> password=<password> ssl_insecure=1 pcmk_host_check=static-list pcmk_host_list=b-gp2-dbpg35-1;b-gp2-dbpg35-2;b-gp2-dbpg35-3
>>>> Operations: monitor interval=60s (vmware_fence-monitor-interval-60s)
>>>> 
>>>> We are using a dedicated service account on the VMware side for pacemaker.
>>>> 
>>>> The clusters are running fine, and no failover events have happened recently.  However, our VMware admin came to me asking why the pacemaker service account is logging in and executing API calls very frequently (for an environment where there are 3 clusters, 9 nodes total, he is seeing ~1400 API 
>>> Haven't looked at the internals of fence_vmware_rest but
>>> sounds like 2-3 API-calls per monitoring (or around 10 API-calls
>>> if it is just one monitored instance per cluster - what the config
>>> snippet from above looks like).
>>> Have you tried to increase the 60s monitoring interval?
>>> 
>>> Klaus
>>>> calls per hour as this user).  I do not see anything logged in corosync.log about why this would be, and my limited understanding was that the fence agent would only be calling the power off and reboot API's when pacemaker couldn't get a response from a node in the cluster.  I thought that using a static-list for the host_check would prevent any API calls for getting a list of hosts, although even if that were going on I would think it would be a rare event.  His concern is that this amount of load on the vmware hosts isn't sustainable.
>>>> 
>>>> Unfortunately the logging available from vmWare doesn't give a lot of information - it just says the number of API calls, not which API(s) were called.
>>>> 
>>>> Any ideas what might be going on?  Is there a way to get increased logging for the fence agent?
>>>> 
>>>> Thanks in advance,
>>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>