[ClusterLabs] Antw: Re: Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Sep 1 09:36:43 EDT 2017


>>> Klechomir <klecho at gmail.com> schrieb am 01.09.2017 um 13:15 in Nachricht
<258bd7d9-ed89-f1f0-8f9b-bca7420c681c at gmail.com>:
> What I observe is that single monitoring request of different resources 
> with different resource agents is timing out.
> 
> For example LVM resource (the LVM RA) does this sometimes.

We had that, too about 6 years ago. Since then we do not monitor the LV state (after having seen what the monitor does). The problem is tha tin a SAN environment with some hundred disks (due to multipath), LVM dies not scale well: The more disks you have, the slower is a thing like vgdisplay.
Maybe that changed since then, but we are still happy. As we don't use raw access to LVs, we don't really miss a problem as the upper layers are monitored...

> Setting ridiculously high timeouts (5 minutes and more) didn't solve the 
> problem, so I think I'm  out of options there.
> Same for other I/O related resources/RAs.
> 
> Regards,
> Klecho
> 
> One of the typical cases is LVM (LVM RA)monitoring.
> 
> On 1.09.2017 11:07, Jehan-Guillaume de Rorthais wrote:
>> On Fri, 01 Sep 2017 09:07:16 +0200
>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>
>>>>>> Klechomir <klecho at gmail.com> schrieb am 01.09.2017 um 08:48 in Nachricht
>>> <9f043557-233d-6c1c-b46d-63f8c2ee59c7 at gmail.com>:
>>>> Hi Ulrich,
>>>> Have to disagree here.
>>>>
>>>> I have cases, when for an unknown reason a single monitoring request
>>>> never returns result.
>>>> So having bigger timeouts doesn't resolve this problem.
>>> But if your monitor hangs instead of giving a result, you also cannot ignore
>>> the result that isn't there! OTOH: Isn't the operation timeout for monitors
>>> that hang? If the monitor is killed, it returns an implicit status (it
>>> failed).
>> I agree. It seems to me the problems comes from either the resource agent or
>> the resource itself. Presently, this issue bothers the cluster stack, but 
> soon
>> or later, it will blows something else. Track where the issue comes from, 
> and
>> fix it.
>>
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 








More information about the Users mailing list