[ClusterLabs] normal reboot with active sbd does not work

Zoran Bošnjak zoran.bosnjak at via.si
Mon Jun 6 06:43:23 EDT 2022


This change in 'watchdog.service' resolves the reboot problem. Thanks.

---
[Unit]
Before=sbd.service

[Install]
RequiredBy=sbd.service
---

Zoran

----- Original Message -----
From: "Klaus Wenninger" <kwenning at redhat.com>
To: "users" <users at clusterlabs.org>
Sent: Friday, June 3, 2022 4:35:49 PM
Subject: Re: [ClusterLabs] normal reboot with active sbd does not work

On Fri, Jun 3, 2022 at 3:51 PM Zoran Bošnjak <zoran.bosnjak at via.si> wrote:
>
> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed OK. I was first experimenting with "softdog", which is blacklisted. So the reasonable question is how to properly start "softdog" on ubuntu.
>
> The reason to unload watchdog module (ipmi or softdog) is that there seems to be a difference between normal reboot and watchdog reboot.
> In case of ipmi watchdog timer reboot:
> - the system hangs at the end of reboot cycle for some time
> - restart seems to be harder (like power off/on cycle), BIOS runs more diagnostics at startup
> - it turns on HW diagnostic indication on the server front panel (dell server) which stays on forever
> - it logs the event to IDRAC, which is unnecessary, because it was not a hardware event, but just a normal reboot
>
> In case of "sudo reboot" command, I would like to skip this... so the idea is to fully stop the watchdog just before reboot. I am not sure how to do this properly.
>
> The "softdog" is better in this respect. It does not trigger nothing from the list above, but I still get the message during reboot
> [ ... ] watchdog: watchdog0: watchdog did not stop!
> ... with some small timeout.
>
> So after some additional testing, the situation is the following:
>
> - without any watchdog and without sbd package, the server reboots normally
> - with "softdog" module loaded, I only get "watchdog did not stop message" at reboot
> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is normal again
> - same as above, but with "sbd" package loaded, I am getting "watchdog did not stop message" again
> - switching from "softdog" to "ipmi_watchdog" gets me to the original list of problems
>
> It looks like the "sbd" is preventing the watchdog to close, so that watchdog triggers always, even in the case of normal reboot. What am I missing here?

sbd has the watchdog-device open and thus is preventing unloading the module.
Without giving any instructions in your unit-file systemd will try to
stop the unit immediately and thus fail.
Have you tried

[Unit]
Before=sbd.service

[Install]
RequiredBy=sbd.service

I would have expected that rebooting with the device disabled again
after sbd shuts down
should behave similarly as with the module being unloaded.
You could check for something like 'nowayout' with the kernel module that would
prevent disabling the watchdog once opened.

Klaus
>
> Zoran
>
> ----- Original Message -----
> From: "Andrei Borzenkov" <arvidjaar at gmail.com>
> To: "users" <users at clusterlabs.org>
> Sent: Friday, June 3, 2022 11:24:03 AM
> Subject: Re: [ClusterLabs] normal reboot with active sbd does not work
>
> On 03.06.2022 11:18, Zoran Bošnjak wrote:
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the watchdog module. For some reason the module is blacklisted on ubuntu,
>
> What makes you think so?
>
> bor at bor-Latitude-E5450:~$ lsb_release  -d
>
> Description:    Ubuntu 20.04.4 LTS
>
> bor at bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog
>
> bor at bor-Latitude-E5450:~$
>
>
>
>
>
> > so I've created a service for this purpose.
> >
>
> man modules-load.d
>
>
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
>
> Without any explicit dependencies stop will be attempted as soon as
> possible.
>
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
>
> Why on earth do you need to unload kernel driver when system reboots?
>
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
>
> There is standard way to load non-autoloaded drivers on *any* systemd
> based distribution. Which is modules-load.d.
>
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network failure as expected (reboots the server). However, when the 'sbd' is active, the server won't reboot normally any more. For example from the command line "sudo reboot", it gets stuck at the end of the reboot sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires and server boots, but the failure indication remains on the front panel of the server. If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> >
> > My question is: How do I configure the system, to have the 'sbd' function present, but still be able to reboot the system normally.
> >
>
> As the first step - do not unload watchdog driver on shutdown.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list