[ClusterLabs] monitor timed out with unknown error

Sun May 5 14:43:15 EDT 2019

Is there a way how I can get Pacemaker to repeat the stop of the resource
if it failed?

Sincerely,
Ark.

eth at ethaniel.com

On Sun, May 5, 2019 at 11:05 PM Andrei Borzenkov <arvidjaar at gmail.com>
wrote:

> 05.05.2019 18:43, Arkadiy Kulev пишет:
> > Dear Andrei,
> >
> > I'm sorry for the screenshot, this is the only thing that I have left
> after
> > the crash.
> >
>
> What crash do you mean? All nodes appear up and running, you are able to
> execute commands, I do not see anything crashed.
>
> > What would the best course of action be in this situation?
>
> Configure STONITH. It is mandatory so pacemaker can resolve such
> situation among others.
>
> For now assuming node problems are over you should be able to clean
> resource state (crm_resource --cleanup). Restarting pacemaker on all
> nodes would also work.
>
> > We don't have a STONITH device. But the local network is still up (both
> > nodes see each othes).
> >
> > Also, what does "(blocked)" means?
> >
>
> It means that pacemaker cannot perform any action on this resource due
> to failed prerequisites. In this case failed prerequisite was successful
> stop of resource.
>
> > Sincerely,
> > Ark.
> >
> > eth at ethaniel.com
> >
> >
> > On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
> >
> >> 05.05.2019 16:14, Arkadiy Kulev пишет:
> >>> Hello!
> >>>
> >>> I run pacemaker on 2 active/active hosts which balance the load of 2
> >> public
> >>> IP addresses.
> >>> A few days ago we ran a very CPU/network intensive process on one of
> the
> >> 2
> >>> hosts and Pacemaker failed.
> >>>
> >>> I've attached a screenshot of the terminal to this email.
> >>>
> >>> The "Failed Actions" shows that the IPaddr2 "monitor_30000" failed with
> >>> "unknown error" and a status of "Timed Out" (queue=0ms exec=0ms). The
> >>> /etc/init.d LSB script (mycluster) failed as well (and set to blocked).
> >>>
> >>> This completely stalled Pacemaker and the second host didn't take over
> >> the
> >>> IP address and gateway settings.
> >>>
> >>> Any ideas would be appreciated.
> >>>
> >>
> >> Stop operation failed, you have no stonith, so pacemaker cannot continue
> >> and is stuck.
> >>
> >>
> >>>
> >>> [image: Screen Shot 2019-04-30 at 12.36.34.png]
> >>>
> >>
> >>
> >> Images are hard to reply to, consume excessive space and cannot be
> >> viewed using text only clients. There is no reason to send image when
> >> you can just copy and paste several lines of text.
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190506/49a80d8c/attachment.html>