[ClusterLabs] Wrong sbd.service dependencies

Andrei Borzenkov arvidjaar at gmail.com
Sun Dec 17 12:10:50 EST 2017


17.12.2017 15:20, Gao,Yan пишет:
> On 2017/12/16 16:59, Andrei Borzenkov wrote:
>> 04.12.2017 21:55, Andrei Borzenkov пишет:
>> ...
>>>>>
>>>>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it
>>>>> has
>>>>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
>>>>> disk at all.
>>>> It simply waits that long on startup before starting the rest of the
>>>> cluster stack to make sure the fencing that targeted it has
>>>> returned. It
>>>> intentionally doesn't watch anything during this period of time.
>>>>
>>>
>>> Unfortunately it waits too long.
>>>
>>> ha1:~ # systemctl status sbd.service
>>> ● sbd.service - Shared-storage based fencing daemon
>>>     Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
>>> preset: disabled)
>>>     Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
>>> 4min 16s ago
>>>    Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
>>> status=0/SUCCESS)
>>>    Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
>>> watch (code=killed, signa
>>>   Main PID: 1792 (code=exited, status=0/SUCCESS)
>>>
>>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
>>> daemon...
>>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
>>> Terminating.
>>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
>>> fencing daemon.
>>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
>>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result
>>> 'timeout'.
>>>
>>> But the real problem is - in spite of SBD failed to start, the whole
>>> cluster stack continues to run; and because SBD blindly trusts in well
>>> behaving nodes, fencing appears to succeed after timeout ... without
>>> anyone taking any action on poison pill ...
>>>
>>
>> That's sbd bug. It declares itself as RequiredBy=corosync.service but
>> puts itself Before=pacemaker.service. Due to systemd design, service A
>> *MUST* have Before dependency on service B if failure to start A should
>> cause failure to start B. *Or* use BindsTo ... but that sounds wrong
>> because it would cause B to start briefly and then be killed.
>>
>> So the question is what is intended here. Should sbd.service be
>> prerequisite for corosync or pacemaker? 
> It should be so only if it's enabled. Try this:
> https://github.com/ClusterLabs/sbd/pull/39
> 

This is wrong, I commented on this pull request.




More information about the Users mailing list