[ClusterLabs] Pacemaker kill does not cause node fault ???

Digimer lists at alteeve.ca
Tue Jan 31 02:58:01 CET 2017


On 10/01/17 05:24 AM, Stefan Schloesser wrote:
> Hi,
> 
> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems to be working ok including the STONITH.
> For test purposes I issued a "pkill -f pace" killing all pacemaker processes on one node.
> 
> Result:
> The node is marked as "pending", all resources stay on it. If I manually kill a resource it is not noticed. On the other node a drbd "promote" command fails (drbd is still running as master on the first node).
> 
> Killing the corosync process works as expected -> STONITH.
> 
> Could someone shed some light on this behavior? 
> 
> Thanks,
> 
> Stefan

A good way to test fencing is to crash the OS with 'echo c >
/proc/sysrq-trigger', which causes an immediate segfault. The only
recovery is a reboot, so it's excellent for simulating a hung node.

Make sure, too, that you've hooked DRBD's fencing into pacemaker with
'fencing resource-and-stonith;' and using the crm-{un,}fence-peer.sh
{un,}fence-handlers.

If these are bare-iron nodes, also test by pulling the power out of the
node entirely while it was running. If you can pass both of these tests,
you will have simulated most all possible node failure modes (I say
'most' because it is impossible to think of everything :) ).

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould



More information about the Users mailing list