[ClusterLabs] Antw: Re: Antw: Re: Antw: RES: Pacemaker and OCFS2 on stand alone mode

Tue Jul 12 15:19:27 EDT 2016

On 07/12/2016 01:16 AM, Ulrich Windl wrote:
>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 12.07.2016 um 07:57 in
> Nachricht <578486E3.3030202 at gmail.com>:
>> 11.07.2016 09:33, Ulrich Windl пишет:
>>>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 09.07.2016 um 10:17
> in
>>> Nachricht <5780B30A.3000901 at gmail.com>:
>>>> 08.07.2016 09:11, Ulrich Windl пишет:
>>>>>>>> "Carlos Xavier" <cbastos at connection.com.br> schrieb am 07.07.2016 um
>>> 18:57 in
>>>>> Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>:
>>>>>> Tank you for the fast reply
>>>>>>
>>>>>>>
>>>>>>> have you configured the stonith and drbd stonith handler?
>>>>>>>
>>>>>>
>>>>>> Yes. they were configured.
>>>>>> The cluster was running fine for more than 4 years, until we loose one
>>> host 
>>>>>> by power supply failure.
>>>>>> Now I need to access the files on the host that is working.
>>>>>
>>>>> Hi,
>>>>>
>>>>> MHO: Have you ever tested the configuration? I wonder why the cluster did
> 
>>>> not do everything to continue.
>>>>>
>>>>
>>>> Stonith most likely failed if node experience complete power failure. We
>>>
>>> You could see a message if that were the case; otherwise the cluster
> should
>>> assume the node was killed after the stonith timeout.
>>>
>>
>> I sincerely hope you do not really mean it. If stonith timed out,
>> cluster cannot assume absolutely anything and definitely *NOT* that node
>> was killed.
> 
> What I mean is: there is no "success status" for STONITH; it is assumed that
> the node will be down after issuing a successful stonith command. You are
> claiming your stonith command was not logging any error, so the cluster will
> assume STONITH was successful after a timeout.

Fence agents do return success/failure; the cluster considers a timeout
to be a failure. The only time the cluster assumes a successful fence is
when sbd-based watchdog is in use.

>>>> were not shown cluster state, so it is just guess; but normally the way
>>>> to recover is to manually declare node as down. Although this does it
>>>> for pacemaker only; I do not know how to do the same for DRBD (unless
>>>> pacemaker somehow forwards this information to it).

DRBD and pacemaker do coordinate fencing, if configured to do so. The
DRBD config syntax varies by version, so check the docs on www.drbd.org
for the version you're using. There's a chapter on pacemaker integration.

>>> DRBD has it own timouts (AFAIR).
>>>
>>
>> That not what I meant.