[ClusterLabs] "Error: unable to fence '001db02a'" but It got fenced anyway
Valentin Vidić
vvidic at valentin-vidic.from.hr
Sun Feb 28 10:59:21 EST 2021
On Sun, Feb 28, 2021 at 03:34:20PM +0000, Eric Robinson wrote:
> 001db02b rebooted. After it came back up, I tried it in the other direction.
>
> On node 001db02b, the command...
>
> # pcs stonith fence 001db02a
>
> ...produced output...
>
> Error: unable to fence '001db02a'.
>
> However, node 001db02a did get restarted!
>
> We also saw this error...
>
> Failed Actions:
> * stonith-001db02ab_start_0 on 001db02a 'unknown error' (1): call=70, status=Timed Out, exitreason='',
> last-rc-change='Sun Feb 28 10:11:10 2021', queued=0ms, exec=20014ms
>
> When that happens, does Pacemaker take over the other node's resources, or not?
Cloud fencing usually requires a higher timeout (20s reported here).
Microsoft seems to suggest the following setup:
# pcs property set stonith-timeout=900
# pcs stonith create rsc_st_azure fence_azure_arm username="login ID"
password="password" resourceGroup="resource group" tenantId="tenant ID"
subscriptionId="subscription id"
pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name"
power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120
pcmk_monitor_retries=4 pcmk_action_limit=3
op monitor interval=3600
https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/high-availability-guide-rhel-pacemaker
--
Valentin
More information about the Users
mailing list