[ClusterLabs] Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

Klaus Wenninger kwenning at redhat.com
Tue Feb 9 10:12:21 EST 2021


On 2/9/21 3:10 PM, Ulrich Windl wrote:
>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am 09.02.2021
> um
> 15:00 in Nachricht <60229563020000A10003ED82 at gwsmtp.uni-regensburg.de>:
>> Hi!
>>
>> I had made a mistake, leading to node h16 to be fenced. After recovery (h16
>> had re‑joined the cluster) I had stopped the node, reconfigured the network,
>> then started the node again.
>> Then I did the same thing (not the unwanted fencing) with h18. When I 
>> started the node again, I saw these unexpected messages:
>>
>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: received pending 
>> action we are supposed to be the owner but it's not in our records ‑> fail
> it
Looks like some part of your cluster still had kept the pending fence action
around when h18 was fencing h16. Can be that the node wasn't around
when this was successful or it can have to do with an issue we had recently
that in certain cases pending fencing actions weren't properly deleted.
This part of the code got a major overhaul recently and the code-parts
referred to by e.g. the assertion aren't there anymore.
That we are seeing this assertion makes me think, you hit the case
with the lingering pending fencing actions (think the lingering one is a
relayed one and looks a bit different than a plain one and thus might
trigger
the assertion).

Klaus
>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error: Operation 'reboot' 
>> targeting h16 on <no‑one> for pacemaker‑controld.9087 at h18.ad643f10: No route
> to 
>> host
>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error:
> stonith_construct_reply: 
>> Triggered assert at fenced_commands.c:2363 : request != NULL
>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: Can't create a sane 
>> reply
>> Feb 09 14:50:18 h18 pacemaker‑controld[31330]:  notice: Peer h16 was not 
>> terminated (reboot) by <anyone> on behalf of pacemaker‑controld.9087: No
> route 
>> to host
>>
>> On the "No route to host": I could ping h16 from h18 using the host name 
>> without any problem.
>>
>> Two points:
>> Why would h18 think h16 should be fenced?
>> The gailed asserztion looks like a programming error.
> "failed assertion", sorry!
>
>> Explanations?
>>
>> Regards,
>> Ulrich
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



More information about the Users mailing list