[ClusterLabs] question about fence-virsh

Digimer lists at alteeve.ca
Fri May 19 22:53:13 UTC 2017


On 19/05/17 05:30 PM, Ken Gaillot wrote:
> On 05/19/2017 03:47 PM, Andrew Kerber wrote:
>> What I am trying to say here is when I get one of the virtual machines
>> in a bad state, I can still log in and reboot it with the reboot
>> command. But I need my fencing resource to handle that reboot.
>>
>> On Fri, May 19, 2017 at 1:32 PM, Andrew Kerber <andrew.kerber at gmail.com
>> <mailto:andrew.kerber at gmail.com>> wrote:
>>
>>     Thanks for the answer, but thats not the problem.  I dont have
>>     access to the console, its a security issue.  I only have access
>>     within the virtual machines, so I want to send the reboot command
>>     within the virtual machine, not to the console. Typically our
>>     hangups are such that the reboot command works, and the machine
>>     hangs at starting back up, and I get an admin to go hit the console.
> 
> What you're asking for is an "ssh" fence agent. While such can be found,
> they are not considered reliable fence agents.
> 
> Your *typical* problem may be solvable with running "reboot" inside the
> VM, but there are situations in which that won't work (kernel panic,
> loss of network connectivity in the VM, crippling load, etc.). Only
> access to the hypervisor can provide a reliable fence mechanism for the VM.
> 
> If you're lucky, whoever is providing your VM can also provide you an
> API to use to request a hard reboot of the VM at the hypervisor level.
> Then, you can see if there is a fence agent already written for that
> API, or modify an existing one to handle it.
> 
> If you can't even get API access to the hypervisor, then you're not
> going to get full HA. You could search for an ssh fence agent, but be
> aware that's a partial solution at best, and you won't be able to
> recover from certain failure scenarios.

Ken is correct. Fencing must work no matter what state the victim is in.
You can see this by running 'echo c > /proc/sysrq-trigger' to cause a
kernel panic and your cluster will hang.

You need to talk to your security team to get access to the hypervisor.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




More information about the Users mailing list