[ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply
Ken Gaillot
kgaillot at redhat.com
Thu Feb 11 13:13:06 EST 2021
On Thu, 2021-02-11 at 13:59 +0100, Ulrich Windl wrote:
> Hi!
>
> After that problem I see this in crm_mon output:
> Failed Fencing Actions:
> * reboot of h16 failed: delegate=h18, client=pacemaker-
> controld.9087,
> origin=h18, last-failed='2021-02-09 14:50:18 +01:00'
>
> Is there a way to clean that up?
stonith_admin --cleanup -H h16 (or '*')
or equivalent in higher-level tool
> BTW: h16 had been booted today and still this message is there.
Yes, that's a feature. :) As long as any node remains up, they will
sync history with each other. That ensures the view is the same
regardless of what node you run the command on.
> Regards,
> Ulrich
>
> > > > Ulrich Windl schrieb am 09.02.2021 um 16:32 in Nachricht
> > > > <6022AB1C.645 :
>
> 161 :
> 60728>:
> > > > > Klaus Wenninger <kwenning at redhat.com> schrieb am 09.02.2021
> > > > > um 16:12 in
> >
> > Nachricht <f828ec0d-7cc5-36b4-ba6b-9aed4b94992f at redhat.com>:
> > > On 2/9/21 3:10 PM, Ulrich Windl wrote:
> > > > > > > "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> > > > > > > schrieb am
>
> 09.02.2021
> > > > um
> > > > 15:00 in Nachricht <
> > > > 60229563020000A10003ED82 at gwsmtp.uni-regensburg.de>:
> > > > > Hi!
> > > > >
> > > > > I had made a mistake, leading to node h16 to be fenced. After
> > > > > recovery
>
> (h16
> > > > > had re‑joined the cluster) I had stopped the node,
> > > > > reconfigured the
>
> network,
> > > > > then started the node again.
> > > > > Then I did the same thing (not the unwanted fencing) with
> > > > > h18. When I
> > > > > started the node again, I saw these unexpected messages:
> > > > >
> > > > > Feb 09 14:50:18 h18 pacemaker‑fenced[31326]: warning:
> > > > > received pending
> > > > > action we are supposed to be the owner but it's not in our
> > > > > records ‑>
>
> fail
> > > > it
> > >
> > > Looks like some part of your cluster still had kept the pending
> > > fence
>
> action
> > > around when h18 was fencing h16. Can be that the node wasn't
> > > around
> > > when this was successful or it can have to do with an issue we
> > > had
>
> recently
> >
> > The node definitely was "around" when h16 had been fenced, so it
> > must be the
> > other rerason (lingering around).
> >
> > > that in certain cases pending fencing actions weren't properly
> > > deleted.
> > > This part of the code got a major overhaul recently and the code-
> > > parts
> > > referred to by e.g. the assertion aren't there anymore.
> > > That we are seeing this assertion makes me think, you hit the
> > > case
> > > with the lingering pending fencing actions (think the lingering
> > > one is a
> > > relayed one and looks a bit different than a plain one and thus
> > > might
> > > trigger
> > > the assertion).
> > >
> > > Klaus
> > > > > Feb 09 14:50:18 h18 pacemaker‑fenced[31326]: error:
> > > > > Operation 'reboot'
> > > > > targeting h16 on <no‑one> for pacemaker‑
> > > > > controld.9087 at h18.ad643f10: No
>
> route
> > > > to
> > > > > host
> > > > > Feb 09 14:50:18 h18 pacemaker‑fenced[31326]: error:
> > > >
> > > > stonith_construct_reply:
> > > > > Triggered assert at fenced_commands.c:2363 : request != NULL
> > > > > Feb 09 14:50:18 h18 pacemaker‑fenced[31326]: warning: Can't
> > > > > create a
>
> sane
> > > > > reply
> > > > > Feb 09 14:50:18 h18 pacemaker‑controld[31330]: notice: Peer
> > > > > h16 was not
> > > > > terminated (reboot) by <anyone> on behalf of
> > > > > pacemaker‑controld.9087:
>
> No
> > > > route
> > > > > to host
> > > > >
> > > > > On the "No route to host": I could ping h16 from h18 using
> > > > > the host name
> > > > > without any problem.
> > > > >
> > > > > Two points:
> > > > > Why would h18 think h16 should be fenced?
> > > > > The gailed asserztion looks like a programming error.
> > > >
> > > > "failed assertion", sorry!
> > > >
> > > > > Explanations?
> > > > >
> > > > > Regards,
> > > > > Ulrich
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > >
> > > >
> > > > _______________________________________________
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > >
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> >
> >
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list