[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

Klaus Wenninger kwenning at redhat.com
Wed Jun 15 15:18:45 EDT 2022


On Wed, Jun 15, 2022 at 2:10 PM Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
> >>> Klaus Wenninger <kwenning at redhat.com> schrieb am 15.06.2022 um 13:22 in
> Nachricht
> <CALrDAo3w1iZOPFV-5Bq=936hz_ctOzSm1DJKmPuiSY7G-BDofg at mail.gmail.com>:
> > On Wed, Jun 15, 2022 at 10:33 AM Ulrich Windl
> > <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> >>
>
> ...
>
> >> (As said above it may be some RAM corruption where SMI (system management
> >> interrupts, or so) play a role, but Dell says the hardware is OK, and using
> >> SLES we don't have software support with Dell, so they won't even consider
> > that
> >> fact.)
> >
> > That happens inside of VMs right? I mean nodes being VMs.
>
> No, it happens on the hypervisor nodes that are part of the cluster.
>

What I described below as well froze the whole machine - till
it was taken down by the hardware-watchdog.

> > A couple of years back I had an issue running protected mode inside
> > of kvm-virtual machines on Lenovo laptops.
> > That was really an SMI issue (obviously issues when an SMI interrupt
> > was invoked during the CPU being in protected mode) that went away
> > disabling SMI interrupts.
> > I have no idea if that is still possible with current chipsets. And I'm not
> > telling you to do that in production but it might be interesting to narrow
> > the issue down still. One might run into thermal issues and such
> > SMI is taking care of on that hardware.
>
> Well, as I have no better idea, I'd probably even give "kick it hard with the foot" a chance ;-)

Don't know if it is of much use but this is what I was using iirc
https://github.com/zultron/smictrl.
Jan back then wrote it for his laptop and mine showed the same behavior and
being close enough chipset-wise it did the trick on mine as well.

Obviously reading uefi-variables from the os as well triggers some SMI action.
So booting with a legacy bios - if possible - might be an interesting test-case.

>
> Regards,
> Ulrich
>
> >
> > Klaus
> >>
> >> But actually I start believing such a system is a good playground for any HA
> >> solution ;-)
> >> Unfortunately here it's much more production than playground...
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>



More information about the Users mailing list