[ClusterLabs] Wrong sbd.service dependencies
arvidjaar at gmail.com
Sun Dec 17 12:10:50 EST 2017
17.12.2017 15:20, Gao,Yan пишет:
> On 2017/12/16 16:59, Andrei Borzenkov wrote:
>> 04.12.2017 21:55, Andrei Borzenkov пишет:
>>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it
>>>>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
>>>>> disk at all.
>>>> It simply waits that long on startup before starting the rest of the
>>>> cluster stack to make sure the fencing that targeted it has
>>>> returned. It
>>>> intentionally doesn't watch anything during this period of time.
>>> Unfortunately it waits too long.
>>> ha1:~ # systemctl status sbd.service
>>> ● sbd.service - Shared-storage based fencing daemon
>>> Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
>>> preset: disabled)
>>> Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
>>> 4min 16s ago
>>> Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
>>> Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
>>> watch (code=killed, signa
>>> Main PID: 1792 (code=exited, status=0/SUCCESS)
>>> дек 04 21:45:32 ha1 systemd: Starting Shared-storage based fencing
>>> дек 04 21:47:02 ha1 systemd: sbd.service: Start operation timed out.
>>> дек 04 21:47:03 ha1 systemd: Failed to start Shared-storage based
>>> fencing daemon.
>>> дек 04 21:47:03 ha1 systemd: sbd.service: Unit entered failed state.
>>> дек 04 21:47:03 ha1 systemd: sbd.service: Failed with result
>>> But the real problem is - in spite of SBD failed to start, the whole
>>> cluster stack continues to run; and because SBD blindly trusts in well
>>> behaving nodes, fencing appears to succeed after timeout ... without
>>> anyone taking any action on poison pill ...
>> That's sbd bug. It declares itself as RequiredBy=corosync.service but
>> puts itself Before=pacemaker.service. Due to systemd design, service A
>> *MUST* have Before dependency on service B if failure to start A should
>> cause failure to start B. *Or* use BindsTo ... but that sounds wrong
>> because it would cause B to start briefly and then be killed.
>> So the question is what is intended here. Should sbd.service be
>> prerequisite for corosync or pacemaker?
> It should be so only if it's enabled. Try this:
This is wrong, I commented on this pull request.
More information about the Users