[ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Feb 9 10:32:44 EST 2021


>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 09.02.2021 um 16:12 in
Nachricht <f828ec0d-7cc5-36b4-ba6b-9aed4b94992f at redhat.com>:
> On 2/9/21 3:10 PM, Ulrich Windl wrote:
>>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am
09.02.2021
>> um
>> 15:00 in Nachricht <60229563020000A10003ED82 at gwsmtp.uni-regensburg.de>:
>>> Hi!
>>>
>>> I had made a mistake, leading to node h16 to be fenced. After recovery
(h16
>>> had re‑joined the cluster) I had stopped the node, reconfigured the
network,
>>> then started the node again.
>>> Then I did the same thing (not the unwanted fencing) with h18. When I 
>>> started the node again, I saw these unexpected messages:
>>>
>>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: received pending 
>>> action we are supposed to be the owner but it's not in our records ‑>
fail
>> it
> Looks like some part of your cluster still had kept the pending fence
action
> around when h18 was fencing h16. Can be that the node wasn't around
> when this was successful or it can have to do with an issue we had recently

The node definitely was "around" when h16 had been fenced, so it must be the
other rerason (lingering around).

> that in certain cases pending fencing actions weren't properly deleted.
> This part of the code got a major overhaul recently and the code-parts
> referred to by e.g. the assertion aren't there anymore.
> That we are seeing this assertion makes me think, you hit the case
> with the lingering pending fencing actions (think the lingering one is a
> relayed one and looks a bit different than a plain one and thus might
> trigger
> the assertion).
> 
> Klaus
>>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error: Operation 'reboot' 
>>> targeting h16 on <no‑one> for pacemaker‑controld.9087 at h18.ad643f10: No
route
>> to 
>>> host
>>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  error:
>> stonith_construct_reply: 
>>> Triggered assert at fenced_commands.c:2363 : request != NULL
>>> Feb 09 14:50:18 h18 pacemaker‑fenced[31326]:  warning: Can't create a sane

>>> reply
>>> Feb 09 14:50:18 h18 pacemaker‑controld[31330]:  notice: Peer h16 was not 
>>> terminated (reboot) by <anyone> on behalf of pacemaker‑controld.9087: No
>> route 
>>> to host
>>>
>>> On the "No route to host": I could ping h16 from h18 using the host name 
>>> without any problem.
>>>
>>> Two points:
>>> Why would h18 think h16 should be fenced?
>>> The gailed asserztion looks like a programming error.
>> "failed assertion", sorry!
>>
>>> Explanations?
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list