[ClusterLabs] Antw: [EXT] delaying start of a resource

Thu Dec 17 05:36:26 EST 2020

On Thu, Dec 17, 2020 at 1:17 PM Gabriele Bulfon <gbulfon at sonicle.com> wrote:
> Actually, reading again the "duplicated IP" message, it was xstha1 that (having the pool mounted and not seeing xstha2 anymore) got the xstha2 IP for NFS.

Which confirms my conclusions.

>
> So I think there is no worry about the zpool!

Actually there is. It still means that pacemaker believes victim node
was eliminated before it happened. If zpool resource were active on
xstha2 and xstha1 performed stonith it is theoretically possible that
xstha1 starts importing pool before xstha2 is finally switched off.
You really need to test how ipmi behaves with your specific hardware
to make sure it is not possible or to adjust stonith agent to handle
delays.

To reiterate:

>
> Da: Andrei Borzenkov <arvidjaar at gmail.com>
>
> It is possible that your IPMI/BMC/whatever implementation responds
> with success before it actually completes this action. I have seen at
> least some delays in the past. There is not really much that can be
> done here except adding artificial delay to stonith resource agent.
> You need to test IPMI functionality before using it in pacemaker.
>
> In this case xstha1 may have configured xstha2_san0_IP resource before
> xstha2 was down. This would explain duplicated IP.
>