[ClusterLabs] Antw: Re: Antw: [EXT] normal reboot with active sbd does not work

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Jun 7 01:43:02 EDT 2022


>>> Zoran Bošnjak <zoran.bosnjak at via.si> schrieb am 03.06.2022 um 15:16 in
Nachricht <332746042.172.1654262179353.JavaMail.zimbra at via.si>:
> Yes, it's dell power edge. Would you know how to disable front panel 
> indication in case of watchdog reset?

I had a Dell support call: Resetting the iDRAC would reset the alert, but it
would also cause interrupted IPMI communication.
ipmitool -I open sel clear
may work, too, but it will clear the event log.

Regards,
Ulrich



> 
> "echo V >/dev/watchdog" makes no difference.
> 
> ----- Original Message -----
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: "users" <users at clusterlabs.org>
> Sent: Friday, June 3, 2022 11:00:18 AM
> Subject: [ClusterLabs] Antw: [EXT] normal reboot with active sbd does not 
> work
> 
>>>> Zoran Bošnjak <zoran.bosnjak at via.si> schrieb am 03.06.2022 um 10:18 in
> Nachricht <2046503996.272.1654244336372.JavaMail.zimbra at via.si>:
>> Hi all,
>> I would appreciate an advice about sbd fencing (without shared storage).
> 
> Not an answer, but curiosity:
> As sbd needs very little space (like just 1MB), did anybody ever try to use

> a
> small computer like a raspberry pi to privide shared storage for SBD via 
> iSCSI
> for example?
> The disk could be a partition of the flash card (it's written quite
rarely).
> 
> ...
>> After some long timeout, it looks like the watchdog timer expires and
server
> 
>> boots, but the failure indication remains on the front panel of the
server.
> 
> 
> Dell PowerEdge? ;-)
> 
> In SLES I have these (among others) settings:
> SBD_WATCHDOG_DEV=/dev/watchdog
> SBD_WATCHDOG_TIMEOUT=30
> SBD_TIMEOUT_ACTION=flush,reboot
> 
> I did:
> h16:~ # echo iTCO_wdt > /etc/modules-load.d/watchdog.conf
> h16:~ # systemctl restart systemd-modules-load
> h16:~ # lsmod | egrep "(wd|dog)"
> iTCO_wdt               16384  0
> iTCO_vendor_support    16384  1 iTCO_wdt
> 
> Later I changed it to:
> h16:~ # echo ipmi_watchdog > /etc/modules-load.d/watchdog.conf
> h16:~ # systemctl restart systemd-modules-load
> 
> After reboot there was a conflict:
> Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: cannot register miscdev on
> minor=130 (err=-16).
> Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: a legacy watchdog module is
> probably present.
> h16:~ # lsmod | grep wd
> wdat_wdt               20480  0
> h16:~ # modprobe -r wdat_wdt
> h16:~ # modprobe ipmi_watchdog
> h16:~ # lsmod | grep wat
> ipmi_watchdog          32768  1
> ipmi_msghandler       114688  4
ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif
> 
> h16:/etc/modprobe.d # cat 99-local.conf
> #
> # please add local extensions to this file
> #
> h16:/etc/modprobe.d # echo 'blacklist wdat_wdt' >> 99-local.conf
> 
> Maybe also check whether „echo V >/dev/watchdog“ will stop the watchdig
> properly. SUSE (and upstream meanwhile Iguess) had to fix it.
> 
> Hope this helps a bit.
> 
> Regards,
> Ulrich
> 
>> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
>> 
>> My question is: How do I configure the system, to have the 'sbd' function 
>> present, but still be able to reboot the system normally.
>> 
>> regards,
>> Zoran
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list