[ClusterLabs] Antw: Ocassionally IPaddr2 resource fails to start

Donat Zenichev donat.zenichev at gmail.com
Mon Oct 7 07:40:38 EDT 2019


Hello and thank you for your answer!

So should I just disable "monitor" options at all? In my case  I'd better
delete the whole "op" row:
"op monitor interval=20 timeout=60 on-fail=restart"

am I correct?

On Mon, Oct 7, 2019 at 2:36 PM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> Hi!
>
> I can't remember the exact reason, but probably it was exactly that what
> made us remove any monitor operation from IPaddr2 (back in 2011). So far no
> problems doing so ;-)
>
>
> Regards,
> Ulrich
> P.S.: Of cource it would be nice if the real issue could be found and
> fixed.
>
> >>> Donat Zenichev <donat.zenichev at gmail.com> schrieb am 20.09.2019 um
> 14:43 in
> Nachricht
> <CANLwQCmVjcaTzHkcJsNOXLJghtYFLvbP3fD_d4NXrNQpM_JLWw at mail.gmail.com>:
> > Hi there!
> >
> > I've got a tricky case, when my IpAddr2 resource fails to start with
> > literally no-reason:
> > "IPSHARED_monitor_20000 on my-master-1 'not running' (7): call=11,
> > status=complete, exitreason='',
> >    last-rc-change='Wed Sep 4 06:08:07 2019', queued=0ms, exec=0ms"
> >
> > Resource IpAddr2 managed to fix itself and continued to work properly
> > further after that.
> >
> > What I've done after, was setting 'Failure-timeout=900' seconds for my
> > IpAddr2 resource, to prevent working of
> > the resource on a node where it fails. I also set the
> > 'migration-threshold=2' so IpAddr2 can fail only 2 times, and goes to a
> > Slave side after that. Meanwhile Master gets banned for 900 seconds.
> >
> > After 900 seconds cluster tries to start IpAddr2 again at Master, in case
> > it's ok, fail counter gets cleared.
> > That's how I avoid appearing of the error I mentioned above.
> >
> > I tried to get so hard, why this can happen, but still no idea on the
> > count. Any clue how to find a reason?
> > And another question, can snap-shoting of VM machines have any impact on
> > such?
> >
> > And my configurations:
> > -------------------------------
> > node 000001: my-master-1
> > node 000002: my-master-2
> >
> > primitive IPSHARED IPaddr2 \
> > params ip=10.10.10.5 nic=eth0 cidr_netmask=24 \
> > meta migration-threshold=2 failure-timeout=900 target-role=Started \
> > op monitor interval=20 timeout=60 on-fail=restart
> >
> > location PREFER_MASTER IPSHARED 100: my-master-1
> >
> > property cib-bootstrap-options: \
> > have-watchdog=false \
> > dc-version=1.1.18-2b07d5c5a9 \
> > cluster-infrastructure=corosync \
> > cluster-name=wall \
> > cluster-recheck-interval=5s \
> > start-failure-is-fatal=false \
> > stonith-enabled=false \
> > no-quorum-policy=ignore \
> > last-lrm-refresh=1554982967
> > -------------------------------
> >
> > Thanks in advance!
> >
> > --
> > --
> > BR, Donat Zenichev
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 

Best regards,
Donat Zenichev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20191007/817721a4/attachment.html>


More information about the Users mailing list