[ClusterLabs] normal reboot with active sbd does not work
Klaus Wenninger
kwenning at redhat.com
Fri Jun 3 05:07:27 EDT 2022
On Fri, Jun 3, 2022 at 11:03 AM Klaus Wenninger <kwenning at redhat.com> wrote:
>
> On Fri, Jun 3, 2022 at 10:19 AM Zoran Bošnjak <zoran.bosnjak at via.si> wrote:
> >
> > Hi all,
> > I would appreciate an advice about sbd fencing (without shared storage).
> >
> > I am using ubuntu 20.04., with default packages from the repository (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> >
> > HW watchdog is present on servers. The first problem was to load/unload the watchdog module. For some reason the module is blacklisted on ubuntu, so I've created a service for this purpose.
> >
> > --- file: /etc/systemd/system/watchdog.service
> > [Unit]
> > Description=Load watchdog timer module
> > After=syslog.target
> >
> > [Service]
> > Type=oneshot
> > RemainAfterExit=yes
> > ExecStart=/sbin/modprobe ipmi_watchdog
> > ExecStop=/sbin/rmmod ipmi_watchdog
> >
> > [Install]
> > WantedBy=multi-user.target
> > ---
> >
> > Is this a proper way to load watchdog module under ubuntu?
> >
> > Anyway, once the module is loaded, the /dev/watchdog (which is required by 'sbd') is present.
> > Next, the 'sbd' is installed by
> >
> > sudo apt install sbd
> > (followed by one reboot to get the sbd active)
> >
> > The configuration of the 'sbd' is default. The sbd reacts to network failure as expected (reboots the server). However, when the 'sbd' is active, the server won't reboot normally any more. For example from the command line "sudo reboot", it gets stuck at the end of the reboot sequence. There is a message on the console:
> >
> > ... reboot progress
> > [ OK ] Finished Reboot.
> > [ OK ] Reached target Reboot.
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> > ... it gets stuck at this point
> >
> > After some long timeout, it looks like the watchdog timer expires and server boots, but the failure indication remains on the front panel of the server. If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> >
> > My question is: How do I configure the system, to have the 'sbd' function present, but still be able to reboot the system normally.
>
> Loading modules - depending on distribution an version - should probably rather
> be done editing /etc/modules or putting some files under /etc/modprobe-d/.
Of course that would require removing the driver from blacklist.
Any reason why you didn't consider that?
> Guess in your case stopping the unit won't work as the watchdog-device is
> still opened by sbd. In general I don't see why the watchdog-module should
> be unloaded upon shutdown. So as a first try you just might remove that part.
>
> Klaus
>
> >
> > regards,
> > Zoran
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
More information about the Users
mailing list