[ClusterLabs] Issue in fence_ilo4 with IPv6 ILO IPs
Ondrej
ondrej-clusterlabs at famera.cz
Mon Apr 1 21:51:56 EDT 2019
On 3/31/19 5:40 AM, Rohit Saini wrote:
> Looking for some help on this.
>
> Thanks,
> Rohit
Hi Rohit,
As a good start to figure out what is happening here can you please
provide more detailed information such as:
1. What is the configuration of the stonith device when using IPv4 and
when using IPv6? ('pcs stonith show --full' - you can obfuscate the
username and password from that output, the main idea is if you are
using 'hostname' or 'IP4/6 address here.
2. What does it mean 'sometime' it happens with IPv6? Is there any
pattern (like every night around 3/4 am, or when there is more traffic
on network, when we test XXX service, etc.) when this happens or does it
looks to be happening randomly? Are there any other IPv6 issues present
on system not related to cluster at time when the timeout is observed?
3. Are there any messages from from fence_ilo4 in the logs
(/var/log/pacemaker.log, /var/log/cluster/corosync/corosync.log,
/var/log/messages, ...) around the time when the timeout is reported
that would suggest what could be happening?
4. Which version of fence_ilo4 are you using?
# rpm -qa|grep fence-agents-ipmilan
# fence-uc-orana
===
To give you some answers your questions with information provided so far:
> 1. Why is it happening only for IPv6 ILO devices? Is this some known
> issue?
Based on the data provided it is not clear where is the issue. Could be
DNS resolution, could be network issue, ...
> 2. Can we increase the timeout period "exec=20006ms" to something else.
Yes you can do that and it may hide/"resolve" the issue if the
fence_ilo4 can finish monitoring in the newly set timeout. You can give
it a try and increase this to 40 seconds to see if that yields a better
results in your environment. While the default 20 seconds should be
enough for majority of environments there might be something requiring
more time in your case that demands more time. Note that this approach
might just effectively hide the underlying issue.
To increase the timeout you should increase it for both 'start' and
'monitor' operation, for example like this:
# pcs stonith update fence-uc-orana op start timeout=40s op monitor
timeout=40s
--
Ondrej
>
> On Thu, Mar 28, 2019 at 11:24 AM Rohit Saini
> <rohitsaini111.forum at gmail.com <mailto:rohitsaini111.forum at gmail.com>>
> wrote:
>
> Hi All,
> I am trying fence_ilo4 with same ILO device having IPv4 and IPv6
> address. I see some discrepancy in both the behaviours:
>
> *1. When ILO has IPv4 address*
> This is working fine and stonith resources are started immediately.
>
> *2. When ILO has IPv6 address*
> Starting of stonith resources is taking more than 20 seconds sometime.
>
> *[root at tigana ~]# pcs status*
> Cluster name: ucc
> Stack: corosync
> Current DC: tigana (version 1.1.16-12.el7-94ff4df) - partition with
> quorum
> Last updated: Wed Mar 27 00:01:37 2019
> Last change: Wed Mar 27 00:01:19 2019 by root via cibadmin on orana
>
> 2 nodes configured
> 4 resources configured
>
> Online: [ orana tigana ]
>
> Full list of resources:
>
> Master/Slave Set: unicloud-master [unicloud]
> Masters: [ orana ]
> Slaves: [ tigana ]
> fence-uc-orana (stonith:fence_ilo4): FAILED orana
> fence-uc-tigana (stonith:fence_ilo4): Started orana
>
> Failed Actions:
> * fence-uc-orana_start_0 on orana 'unknown error' (1): call=32,
> status=Timed Out, exitreason='none',
> last-rc-change='Wed Mar 27 00:01:17 2019', queued=0ms,
> exec=20006ms *<<<<<<<*
>
>
> *Queries:*
> 1. Why is it happening only for IPv6 ILO devices? Is this some known
> issue?
> 2. Can we increase the timeout period "exec=20006ms" to something else.
>
>
> Thanks,
> Rohit
More information about the Users
mailing list