[ClusterLabs] How to set up fencing/stonith

Casey & Gina caseyandgina at icloud.com
Fri May 18 21:32:00 UTC 2018


> May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng:  warning: log_operation:       vfencing:16264 [ Performing: stonith -t external/vcenter -T reset d-gp2-dbpg0-1 ]
> May 18 20:36:27 [4282] d-gp2-dbpg0-2 stonith-ng:  warning: log_operation:       vfencing:16264 [ failed: d-gp2-dbpg0-1 5 ]

Tried to execute the above-mentioned command manually, but this didn't work, saying "Must specify either -p option, -F option, -E option, or name=value style arguments".

So I added in the parameters from the pcs command and tried this:

stonith -t external/vcenter VI_SERVER="10.124.137.100" VI_CREDSTORE="/etc/pacemaker/vicredentials.xml" HOSTLIST="d-gp2-dbpg0-1=d-gp2-dbpg0-1;d-gp2-dbpg0-2=d-gp2-dbpg0-2;d-gp2-dbpg0-3=d-gp2-dbpg0-3" RESETPOWERON="0" -T reset d-gp2-dbpg0-1

This resulted in the following error:

external/vcenter[30902]: WARN: Tried to ResetVM d-gp2-esx-1215.imovetv.com:d-gp2-dbpg0-1 that was poweredOff
external/vcenter[30902]: ERROR: [reset d-gp2-dbpg0-1] Could not complete d-gp2-esx-1215.imovetv.com:d-gp2-dbpg0-1 power cycle
external/vcenter[30902]: ERROR: [reset d-gp2-dbpg0-1] Died at /usr/lib/stonith/plugins/external/vcenter line 22.

I tried running the same command with 'on' instead of 'reset', and it successfully powered the VM back on:

external/vcenter[30925]: info: Machine d-gp2-esx-1215.imovetv.com:d-gp2-dbpg0-1 has been powered on

So, then instead of powering off the VM in vSphere, I instead tried a `killall -9 corosync` on the primary.  This resulted in the VIP coming up on node 3, and node 1 being rebooted.  Great!

So now my concern is this - our VM's are distributed across 32 hosts.  One condition we were hoping to handle was when one of those host machines fails, due to bad memory or something else, as it is likely that not all of the nodes within a cluster are residing on the same VM host (there may even be some way to configure them to stay on separate hosts in ESX).  In this case, a reset command will fail as well, I'd assume.  I had thought that when the resource was fenced, it was done with an 'off' command, and that the resources would be brought up on a standby node.  Is there a way to make this work?

Thanks again, learning more step-by-step!
-- 
Casey


More information about the Users mailing list