[ClusterLabs] Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Mar 3 01:56:37 EST 2021


>>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 02.03.2021 um 19:26 in
Nachricht
<SA2PR03MB58847E37845FC6C92BC3007EFA999 at SA2PR03MB5884.namprd03.prod.outlook.com>

>>  -----Original Message-----
>> From: Users <users-bounces at clusterlabs.org> On Behalf Of Digimer
>> Sent: Monday, March 1, 2021 11:02 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed
>> <users at clusterlabs.org>; Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>> Subject: Re: [ClusterLabs] Antw: [EXT] Re: "Error: unable to fence 
> '001db02a'"
...
>> >> Cloud fencing usually requires a higher timeout (20s reported here).
>> >>
>> >> Microsoft seems to suggest the following setup:
>> >>
>> >> # pcs property set stonith‑timeout=900
>> >
>> > But doesn't that mean the other node waits 15 minutes after stonith
>> > until it performs the first post-stonith action?
>>
>> No, it means that if there is no reply by then, the fence has failed. If
the
>> fence happens sooner, and the caller is told this, recovery begins very 
> shortly
>> after.

How would the fencing be confirmed? I don't know.


>>
> 
> Interesting. Since users often report application failure within 1-3 minutes

> and may engineers begin investigating immediately, a technician could end up

> connecting to a cluster node after the stonith command was called, and could

> conceivably bring a failed no back up manually, only to have Azure finally 
> get around to shooting it in the head. I don't suppose there's a way to 
> abort/cancel a STONITH operation that is in progress?

I think you have to decide: Let the cluster handle the problem, or let the
admin handle the problem, but preferrably not both.
I also think you cannot cancel a STONITH; you can only confirm it.

Regards,
Ulrich

...




More information about the Users mailing list