[ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Feb 11 07:59:20 EST 2021


Hi!

After that problem I see this in crm_mon output:
Failed Fencing Actions:
  * reboot of h16 failed: delegate=h18, client=pacemaker-controld.9087,
origin=h18, last-failed='2021-02-09 14:50:18 +01:00'

Is there a way to clean that up?
BTW: h16 had been booted today and still this message is there.

Regards,
Ulrich

>>> Ulrich Windl schrieb am 09.02.2021 um 16:32 in Nachricht <6022AB1C.645 :
161 :
60728>:
>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 09.02.2021 um 16:12 in
> Nachricht <f828ec0d-7cc5-36b4-ba6b-9aed4b94992f at redhat.com>:
> > On 2/9/21 3:10 PM, Ulrich Windl wrote:
> >>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am
09.02.2021
> >> um
> >> 15:00 in Nachricht <60229563020000A10003ED82 at gwsmtp.uni-regensburg.de>:
> >>> Hi!
> >>>
> >>> I had made a mistake, leading to node h16 to be fenced. After recovery
(h16
> >>> had re‑joined the cluster) I had stopped the node, reconfigured the
network,
> >>> then started the node again.
> >>> Then I did the same thing (not the unwanted fencing) with h18. When I 
> >>> started the node again, I saw these unexpected messages:
> >>>
> >>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: received pending

> >>> action we are supposed to be the owner but it's not in our records ‑>
fail
> >> it
> > Looks like some part of your cluster still had kept the pending fence
action
> > around when h18 was fencing h16. Can be that the node wasn't around
> > when this was successful or it can have to do with an issue we had
recently
> 
> The node definitely was "around" when h16 had been fenced, so it must be the

> other rerason (lingering around).
> 
> > that in certain cases pending fencing actions weren't properly deleted.
> > This part of the code got a major overhaul recently and the code-parts
> > referred to by e.g. the assertion aren't there anymore.
> > That we are seeing this assertion makes me think, you hit the case
> > with the lingering pending fencing actions (think the lingering one is a
> > relayed one and looks a bit different than a plain one and thus might
> > trigger
> > the assertion).
> > 
> > Klaus
> >>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error: Operation 'reboot'

> >>> targeting h16 on <no‑one> for pacemaker‑controld.9087 at h18.ad643f10: No
route
> >> to 
> >>> host
> >>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error:
> >> stonith_construct_reply: 
> >>> Triggered assert at fenced_commands.c:2363 : request != NULL
> >>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: Can't create a
sane 
> >>> reply
> >>> Feb 09 14:50:18 h18 pacemaker‑controld[31330]:  notice: Peer h16 was not

> >>> terminated (reboot) by <anyone> on behalf of pacemaker‑controld.9087:
No
> >> route 
> >>> to host
> >>>
> >>> On the "No route to host": I could ping h16 from h18 using the host name

> >>> without any problem.
> >>>
> >>> Two points:
> >>> Why would h18 think h16 should be fenced?
> >>> The gailed asserztion looks like a programming error.
> >> "failed assertion", sorry!
> >>
> >>> Explanations?
> >>>
> >>> Regards,
> >>> Ulrich
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Manage your subscription:
> >>> https://lists.clusterlabs.org/mailman/listinfo/users 
> >>>
> >>> ClusterLabs home: https://www.clusterlabs.org/ 
> >>
> >>
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users 
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/ 
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> 





More information about the Users mailing list