[ClusterLabs] Antw: Re: Antw: [EXT] normal reboot with active sbd does not work
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Jun 7 01:43:02 EDT 2022
>>> Zoran Bošnjak <zoran.bosnjak at via.si> schrieb am 03.06.2022 um 15:16 in
Nachricht <332746042.172.1654262179353.JavaMail.zimbra at via.si>:
> Yes, it's dell power edge. Would you know how to disable front panel
> indication in case of watchdog reset?
I had a Dell support call: Resetting the iDRAC would reset the alert, but it
would also cause interrupted IPMI communication.
ipmitool -I open sel clear
may work, too, but it will clear the event log.
Regards,
Ulrich
>
> "echo V >/dev/watchdog" makes no difference.
>
> ----- Original Message -----
> From: "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
> To: "users" <users at clusterlabs.org>
> Sent: Friday, June 3, 2022 11:00:18 AM
> Subject: [ClusterLabs] Antw: [EXT] normal reboot with active sbd does not
> work
>
>>>> Zoran Bošnjak <zoran.bosnjak at via.si> schrieb am 03.06.2022 um 10:18 in
> Nachricht <2046503996.272.1654244336372.JavaMail.zimbra at via.si>:
>> Hi all,
>> I would appreciate an advice about sbd fencing (without shared storage).
>
> Not an answer, but curiosity:
> As sbd needs very little space (like just 1MB), did anybody ever try to use
> a
> small computer like a raspberry pi to privide shared storage for SBD via
> iSCSI
> for example?
> The disk could be a partition of the flash card (it's written quite
rarely).
>
> ...
>> After some long timeout, it looks like the watchdog timer expires and
server
>
>> boots, but the failure indication remains on the front panel of the
server.
>
>
> Dell PowerEdge? ;-)
>
> In SLES I have these (among others) settings:
> SBD_WATCHDOG_DEV=/dev/watchdog
> SBD_WATCHDOG_TIMEOUT=30
> SBD_TIMEOUT_ACTION=flush,reboot
>
> I did:
> h16:~ # echo iTCO_wdt > /etc/modules-load.d/watchdog.conf
> h16:~ # systemctl restart systemd-modules-load
> h16:~ # lsmod | egrep "(wd|dog)"
> iTCO_wdt 16384 0
> iTCO_vendor_support 16384 1 iTCO_wdt
>
> Later I changed it to:
> h16:~ # echo ipmi_watchdog > /etc/modules-load.d/watchdog.conf
> h16:~ # systemctl restart systemd-modules-load
>
> After reboot there was a conflict:
> Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: cannot register miscdev on
> minor=130 (err=-16).
> Dec 04 12:07:22 h16 kernel: watchdog: wdat_wdt: a legacy watchdog module is
> probably present.
> h16:~ # lsmod | grep wd
> wdat_wdt 20480 0
> h16:~ # modprobe -r wdat_wdt
> h16:~ # modprobe ipmi_watchdog
> h16:~ # lsmod | grep wat
> ipmi_watchdog 32768 1
> ipmi_msghandler 114688 4
ipmi_devintf,ipmi_si,ipmi_watchdog,ipmi_ssif
>
> h16:/etc/modprobe.d # cat 99-local.conf
> #
> # please add local extensions to this file
> #
> h16:/etc/modprobe.d # echo 'blacklist wdat_wdt' >> 99-local.conf
>
> Maybe also check whether „echo V >/dev/watchdog“ will stop the watchdig
> properly. SUSE (and upstream meanwhile Iguess) had to fix it.
>
> Hope this helps a bit.
>
> Regards,
> Ulrich
>
>> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
>>
>> My question is: How do I configure the system, to have the 'sbd' function
>> present, but still be able to reboot the system normally.
>>
>> regards,
>> Zoran
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list