[ClusterLabs] monitor timed out with unknown error

Arkadiy Kulev eth at ethaniel.com
Sun May 5 11:43:42 EDT 2019


Dear Andrei,

I'm sorry for the screenshot, this is the only thing that I have left after
the crash.

What would the best course of action be in this situation?
We don't have a STONITH device. But the local network is still up (both
nodes see each othes).

Also, what does "(blocked)" means?

Sincerely,
Ark.

eth at ethaniel.com


On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> 05.05.2019 16:14, Arkadiy Kulev пишет:
> > Hello!
> >
> > I run pacemaker on 2 active/active hosts which balance the load of 2
> public
> > IP addresses.
> > A few days ago we ran a very CPU/network intensive process on one of the
> 2
> > hosts and Pacemaker failed.
> >
> > I've attached a screenshot of the terminal to this email.
> >
> > The "Failed Actions" shows that the IPaddr2 "monitor_30000" failed with
> > "unknown error" and a status of "Timed Out" (queue=0ms exec=0ms). The
> > /etc/init.d LSB script (mycluster) failed as well (and set to blocked).
> >
> > This completely stalled Pacemaker and the second host didn't take over
> the
> > IP address and gateway settings.
> >
> > Any ideas would be appreciated.
> >
>
> Stop operation failed, you have no stonith, so pacemaker cannot continue
> and is stuck.
>
>
> >
> > [image: Screen Shot 2019-04-30 at 12.36.34.png]
> >
>
>
> Images are hard to reply to, consume excessive space and cannot be
> viewed using text only clients. There is no reason to send image when
> you can just copy and paste several lines of text.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190505/aea0f8fb/attachment.html>


More information about the Users mailing list