<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Ondrej,<div>Please find my reply below:</div><div><br></div><div>1.</div><div><div><b>Stonith configuration:</b></div><div><div>[root@orana ~]# pcs config</div><div> Resource: fence-uc-orana (class=stonith type=fence_ilo4)</div><div>  Attributes: delay=0 ipaddr=fd00:1061:37:9002:: lanplus=1 login=xyz passwd=xyz pcmk_host_list=orana pcmk_reboot_action=off</div><div>  Meta Attrs: failure-timeout=3s</div><div>  Operations: monitor interval=5s on-fail=ignore (fence-uc-orana-monitor-interval-5s)</div><div>              start interval=0s on-fail=restart (fence-uc-orana-start-interval-0s)</div><div> Resource: fence-uc-tigana (class=stonith type=fence_ilo4)</div><div>  Attributes: delay=10 ipaddr=fd00:1061:37:9001:: lanplus=1 login=xyz passwd=xyz pcmk_host_list=tigana pcmk_reboot_action=off</div><div>  Meta Attrs: failure-timeout=3s</div><div>  Operations: monitor interval=5s on-fail=ignore (fence-uc-tigana-monitor-interval-5s)</div><div>              start interval=0s on-fail=restart (fence-uc-tigana-start-interval-0s)</div></div></div><div><br></div><div><div>Fencing Levels:</div><div><br></div><div>Location Constraints:</div><div>Ordering Constraints:</div><div>  start fence-uc-orana then promote unicloud-master (kind:Mandatory)</div><div>  start fence-uc-tigana then promote unicloud-master (kind:Mandatory)</div><div>Colocation Constraints:</div><div>  fence-uc-orana with unicloud-master (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)</div><div>  fence-uc-tigana with unicloud-master (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)</div></div><div><br></div><div><br></div><div>2. This is seen randomly. Since I am using colocation, stonith resources are stopped and started on new master. That time, starting of stonith is taking variable amount of time.</div><div>No other IPv6 issues are seen in the cluster nodes.</div><div><br></div><div>3. fence_agent version</div><div><br></div><div><div>[root@orana ~]#  rpm -qa|grep  fence-agents-ipmilan</div><div>fence-agents-ipmilan-4.0.11-66.el7.x86_64</div></div><div><br></div><div><br></div><div><b>NOTE:</b></div><div>Both IPv4 and IPv6 are configured on my ILO, with "<span style="color:rgb(0,0,0);font-family:Arial,sans-serif,Verdana,Helvetica,"LuzSans Book","HPFutura Book","Futura Bk";font-size:13px">iLO Client Applications use IPv6 first</span>" turned on.</div><div>Attaching corosync logs also.</div><div><br></div><div>Thanks, increasing timeout to 60 worked. But thats not what exactly I am looking for. I need to know exact reason behind delay of starting these IPv6 stonith resources.</div><div><br></div><div>Regards,</div><div>Rohit</div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 2, 2019 at 7:22 AM Ondrej <<a href="mailto:ondrej-clusterlabs@famera.cz">ondrej-clusterlabs@famera.cz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 3/31/19 5:40 AM, Rohit Saini wrote:<br>

> Looking for some help on this.<br>

> <br>

> Thanks,<br>

> Rohit<br>

<br>

Hi Rohit,<br>

<br>

As a good start to figure out what is happening here can you please <br>

provide more detailed information such as:<br>

<br>

1. What is the configuration of the stonith device when using IPv4 and <br>

when using IPv6? ('pcs stonith show --full' - you can obfuscate the <br>

username and password from that output, the main idea is if you are <br>

using 'hostname' or 'IP4/6 address here.<br>

<br>

2. What does it mean 'sometime' it happens with IPv6? Is there any <br>

pattern (like every night around 3/4 am, or when there is more traffic <br>

on network, when we test XXX service, etc.) when this happens or does it <br>

looks to be happening randomly? Are there any other IPv6 issues present <br>

on system not related to cluster at time when the timeout is observed?<br>

<br>

3. Are there any messages from from fence_ilo4 in the logs <br>

(/var/log/pacemaker.log, /var/log/cluster/corosync/corosync.log, <br>

/var/log/messages, ...) around the time when the timeout is reported <br>

that would suggest what could be happening?<br>

<br>

4. Which version of fence_ilo4 are you using?<br>

# rpm -qa|grep  fence-agents-ipmilan<br>

# fence-uc-orana<br>

<br>

===<br>

To give you some answers your questions with information provided so far:<br>

 > 1. Why is it happening only for IPv6 ILO devices? Is this some known<br>

 > issue?<br>

Based on the data provided it is not clear where is the issue. Could be <br>

DNS resolution, could be network issue, ...<br>

<br>

 > 2. Can we increase the timeout period "exec=20006ms" to something else.<br>

Yes you can do that and it may hide/"resolve" the issue if the <br>

fence_ilo4 can finish monitoring in the newly set timeout. You can give <br>

it a try and increase this to 40 seconds to see if that yields a better <br>

results in your environment. While the default 20 seconds should be <br>

enough for majority of environments there might be something requiring <br>

more time in your case that demands more time. Note that this approach <br>

might just effectively hide the underlying issue.<br>

To increase the timeout you should increase it for both 'start' and <br>

'monitor' operation, for example like this:<br>

<br>

# pcs stonith update fence-uc-orana op start timeout=40s op monitor <br>

timeout=40s<br>

<br>

--<br>

Ondrej<br>

<br>

> <br>

> On Thu, Mar 28, 2019 at 11:24 AM Rohit Saini <br>

> <<a href="mailto:rohitsaini111.forum@gmail.com" target="_blank">rohitsaini111.forum@gmail.com</a> <mailto:<a href="mailto:rohitsaini111.forum@gmail.com" target="_blank">rohitsaini111.forum@gmail.com</a>>> <br>

> wrote:<br>

> <br>

>     Hi All,<br>

>     I am trying fence_ilo4 with same ILO device having IPv4 and IPv6<br>

>     address. I see some discrepancy in both the behaviours:<br>

> <br>

>     *1. When ILO has IPv4 address*<br>

>     This is working fine and stonith resources are started immediately.<br>

> <br>

>     *2. When ILO has IPv6 address*<br>

>     Starting of stonith resources is taking more than 20 seconds sometime.<br>

> <br>

>     *[root@tigana ~]# pcs status*<br>

>     Cluster name: ucc<br>

>     Stack: corosync<br>

>     Current DC: tigana (version 1.1.16-12.el7-94ff4df) - partition with<br>

>     quorum<br>

>     Last updated: Wed Mar 27 00:01:37 2019<br>

>     Last change: Wed Mar 27 00:01:19 2019 by root via cibadmin on orana<br>

> <br>

>     2 nodes configured<br>

>     4 resources configured<br>

> <br>

>     Online: [ orana tigana ]<br>

> <br>

>     Full list of resources:<br>

> <br>

>       Master/Slave Set: unicloud-master [unicloud]<br>

>           Masters: [ orana ]<br>

>           Slaves: [ tigana ]<br>

>       fence-uc-orana (stonith:fence_ilo4):   FAILED orana<br>

>       fence-uc-tigana        (stonith:fence_ilo4):   Started orana<br>

> <br>

>     Failed Actions:<br>

>     * fence-uc-orana_start_0 on orana 'unknown error' (1): call=32,<br>

>     status=Timed Out, exitreason='none',<br>

>          last-rc-change='Wed Mar 27 00:01:17 2019', queued=0ms,<br>

>     exec=20006ms *<<<<<<<*<br>

> <br>

> <br>

>     *Queries:*<br>

>     1. Why is it happening only for IPv6 ILO devices? Is this some known<br>

>     issue?<br>

>     2. Can we increase the timeout period "exec=20006ms" to something else.<br>

> <br>

> <br>

>     Thanks,<br>

>     Rohit<br>

<br>

</blockquote></div>