[ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

Wed Feb 16 12:30:56 EST 2022

On Wed, Feb 16, 2022 at 4:59 PM Klaus Wenninger <kwenning at redhat.com> wrote:

>
>
> On Wed, Feb 16, 2022 at 4:26 PM Klaus Wenninger <kwenning at redhat.com>
> wrote:
>
>>
>>
>> On Wed, Feb 16, 2022 at 3:09 PM Ulrich Windl <
>> Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>
>>> Hi!
>>>
>>> When changing some FC cables I noticed that sbd complained 2 seconds
>>> after the connection went down (event though the device is multi-pathed
>>> with other paths being still up).
>>> I don't know any sbd parameter being set so low that after 2 seconds sbd
>>> would panic. Which parameter (if any) is responsible for that?
>>>
>>> In fact multipath takes up to 5 seconds to adjust paths.
>>>
>>> Here are some sample events (sbd-1.5.0+20210720.f4ca41f-3.6.1.x86_64
>>> from SLES15 SP3):
>>> Feb 14 13:01:36 h18 kernel: qla2xxx [0000:41:00.0]-500b:3: LOOP DOWN
>>> detected (2 7 0 0).
>>> Feb 14 13:01:38 h18 sbd[6621]: /dev/disk/by-id/dm-name-SBD_1-3P2:
>>> error: servant_md: slot read failed in servant.
>>> Feb 14 13:01:38 h18 sbd[6619]: /dev/disk/by-id/dm-name-SBD_1-3P1:
>>> error: servant_md: mbox read failed in servant.
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>>> /dev/disk/by-id/dm-name-SBD_1-3P1 is outdated (age: 11)
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Servant
>>> /dev/disk/by-id/dm-name-SBD_1-3P2 is outdated (age: 11)
>>> Feb 14 13:01:40 h18 sbd[6615]:  warning: inquisitor_child: Majority of
>>> devices lost - surviving on pacemaker
>>> Feb 14 13:01:42 h18 kernel: sd 3:0:3:2: rejecting I/O to offline device
>>> Feb 14 13:01:42 h18 kernel: blk_update_request: I/O error, dev sdbt,
>>> sector 2048 op 0x0:(READ) flags 0x4200 phys_seg 1 prio class 1
>>> Feb 14 13:01:42 h18 kernel: device-mapper: multipath: 254:17: Failing
>>> path 68:112.
>>> Feb 14 13:01:42 h18 kernel: sd 3:0:1:2: rejecting I/O to offline device
>>>
>> Sry forgotten to address the following.
>
> Guess your sbd-package predates
>
> https://github.com/ClusterLabs/sbd/commit/9e6cbbad9e259de374cbf41b713419c342528db1
> and thus doesn't properly destroy the io-context using the aio-api.
> This flaw has been in kind of since ever and I actually found it due to a
> kernel-issue that made
> all block-io done the way sbd is doing it (aio + O_SYNC + O_DIRECT
> Actually never successfully
> tracked it down to the real kernel issue playing with kprobes. But it was
> gone on the next kernel
> update
> ) timeout.
> Without survival on pacemaker it would have suicided after
> msgwait-timeout (10s in your case probably).
> Would be interesting what happens if you raise msgwait-timeout to a value
> that would allow
> another read attempt.
> Does your setup actually recover? Could be possible that it doesn't
> missing the fix referenced above.
>
One more thing:
Even if it looks as if it recovers there might be a leak of kernel
resources (maybe per process)
so that issues surface just after the timeout has happened several times.

>
> Regards,
> Klaus
>
>>
>>> Most puzzling is the fact that sbd reports a problem 4 seconds before
>>> the kernel reports an I/O error. I guess sbd "times out" the pending read.
>>>
>> Yep - that is timeout_io defaulting to 3s.
>> You can set it with -I daemon start parameter.
>> Together with the rest of the default-timeout-scheme the 3s do make sense.
>> Not sure but if you increase that significantly you might have to adapt
>> other timeouts.
>> There are a certain number of checks regarding relationship of timeouts
>> but they might not be exhaustive.
>>
>>>
>>> The thing is: Both SBD disks are on different storage systems, each
>>> being connected by two separate FC fabrics, but still when disconnecting
>>> one cable from the host sbd panics.
>>> My guess is if "surviving on pacemaker" would not have happened, the
>>> node would be fenced; is that right?
>>>
>>> The other thing I wonder is the "outdated age":
>>> How can the age be 11 (seconds) when the disk was disconnected 4 seconds
>>> ago?
>>> It seems here the age is "current time - time_of_last read" instead of
>>> "current_time - time_when read_attempt_started".
>>>
>> Exactly! And that is the correct way to do it as we need to record the
>> time passed since last successful read.
>> There is no value in starting the clock when we start the read attempt as
>> these attempts are not synced throughout
>> the cluster.
>>
>> Regards,
>> Klaus
>>
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220216/d295d776/attachment.htm>