[ClusterLabs] Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Mar 1 02:50:15 EST 2021
>>> Valentin Vidic <vvidic at valentin-vidic.from.hr> schrieb am 28.02.2021 um
16:59
in Nachricht <20210228155921.GM29617 at valentin-vidic.from.hr>:
> On Sun, Feb 28, 2021 at 03:34:20PM +0000, Eric Robinson wrote:
>> 001db02b rebooted. After it came back up, I tried it in the other
direction.
>>
>> On node 001db02b, the command...
>>
>> # pcs stonith fence 001db02a
>>
>> ...produced output...
>>
>> Error: unable to fence '001db02a'.
>>
>> However, node 001db02a did get restarted!
>>
>> We also saw this error...
>>
>> Failed Actions:
>> * stonith‑001db02ab_start_0 on 001db02a 'unknown error' (1): call=70,
> status=Timed Out, exitreason='',
>> last‑rc‑change='Sun Feb 28 10:11:10 2021', queued=0ms, exec=20014ms
>>
>> When that happens, does Pacemaker take over the other node's resources, or
> not?
>
> Cloud fencing usually requires a higher timeout (20s reported here).
>
> Microsoft seems to suggest the following setup:
>
> # pcs property set stonith‑timeout=900
But doesn't that mean the other node waits 15 minutes after stonith until it
performs the first post-stonith action?
> # pcs stonith create rsc_st_azure fence_azure_arm username="login ID"
> password="password" resourceGroup="resource group" tenantId="tenant ID"
> subscriptionId="subscription id"
>
pcmk_host_map="prod‑cl1‑0:prod‑cl1‑0‑vm‑name;prod‑cl1‑1:prod‑cl1‑1‑vm‑name"
> power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120
> pcmk_monitor_retries=4 pcmk_action_limit=3
> op monitor interval=3600
>
>
https://docs.microsoft.com/en‑us/azure/virtual‑machines/workloads/sap/high‑avai
> lability‑guide‑rhel‑pacemaker
>
> ‑‑
> Valentin
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list