[ClusterLabs] Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

Mon Mar 1 02:50:15 EST 2021

>>> Valentin Vidic <vvidic at valentin-vidic.from.hr> schrieb am 28.02.2021 um
16:59
in Nachricht <20210228155921.GM29617 at valentin-vidic.from.hr>:
> On Sun, Feb 28, 2021 at 03:34:20PM +0000, Eric Robinson wrote:
>> 001db02b rebooted. After it came back up, I tried it in the other
direction.
>> 
>> On node 001db02b, the command...
>> 
>> # pcs stonith fence 001db02a
>> 
>> ...produced output...
>> 
>> Error: unable to fence '001db02a'.
>> 
>> However, node 001db02a did get restarted!
>> 
>> We also saw this error...
>> 
>> Failed Actions:
>> * stonith‑001db02ab_start_0 on 001db02a 'unknown error' (1): call=70, 
> status=Timed Out, exitreason='',
>>     last‑rc‑change='Sun Feb 28 10:11:10 2021', queued=0ms, exec=20014ms
>> 
>> When that happens, does Pacemaker take over the other node's resources, or

> not?
> 
> Cloud fencing usually requires a higher timeout (20s reported here).
> 
> Microsoft seems to suggest the following setup:
> 
> # pcs property set stonith‑timeout=900

But doesn't that mean the other node waits 15 minutes after stonith until it
performs the first post-stonith action?

> # pcs stonith create rsc_st_azure fence_azure_arm username="login ID"
>   password="password" resourceGroup="resource group" tenantId="tenant ID"
>   subscriptionId="subscription id"
>  
pcmk_host_map="prod‑cl1‑0:prod‑cl1‑0‑vm‑name;prod‑cl1‑1:prod‑cl1‑1‑vm‑name"
>   power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120
>   pcmk_monitor_retries=4 pcmk_action_limit=3
>   op monitor interval=3600
> 
>
https://docs.microsoft.com/en‑us/azure/virtual‑machines/workloads/sap/high‑avai

> lability‑guide‑rhel‑pacemaker
> 
> ‑‑ 
> Valentin
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/