<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>> Date: Fri, 4 Jul 2014 23:17:07 +0900<br><div>> From: lists@alteeve.ca<br>> To: pacemaker@oss.clusterlabs.org<br>> Subject: Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)<br>> <br>> On 04/07/14 02:16 PM, Giuseppe Ragusa wrote:<br>> > Hi all,<br>> > I'm trying to create a script as per subject (on CentOS 6.5,<br>> > CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS<br>> > monitored by NUT).<br>> ><br>> > Ideally I think that each node should stop (disable) all locally-running<br>> > VirtualDomain resources (doing so cleanly demotes than downs the DRBD<br>> > resources underneath), then put itself in standby and finally shutdown.<br>> ><br>> > On further startup, manual intervention would be required to unstandby<br>> > all nodes and enable resources (nodes already in standby and resources<br>> > already disabled before blackout should be manually distinguished).<br>> ><br>> > Is this strategy conceptually safe?<br>> ><br>> > Unfortunately, various searches have turned out no "prior art" :)<br>> <br>> I started work on something similar with apcupsd (first I had to make it <br>> work with multiple UPSes, which I did). Then I decided not to actually <br>> implement, and decided instead to leave it up to an admin to decide <br>> how/when/if to initiate a graceful shutdown.<br>> <br>> My rationale was that this placed way too much potential damage in the <br>> hands of, effectively, a single trigger. One bad bug and you could bring <br>> down a perfectly fine cluster.<br><br>Perfectly reasonable, in fact I was limiting my effort to a single, narrowly defined case.<br><br>> Instead, what I did was ensure that any power event triggered an alert <br>> email (x2, as both nodes ran the monitoring app). This way, I (and the <br>> client's admins) would be notified immediately if anything happened. <br>> Then it was up to us to decide how/if to initiate a graceful shutdown.<br><br>My clients business setup is peculiar too: too big to disregard HA
solutions, but<br>too small to have staff/consultants on call for "secondary"
emergencies (like<br>power going extendedly down during summer storms
etc.).<br><br>> One real-world example;<br>> <br>> A couple months ago, a client's neighborhood was hit with a prolonged <br>> power outage. Eventually, we decided to gracefully shut down. However, <br>> one of the windows VMs had downloaded and prepped to install about 30 <br>> updates (no idea how this happened, except windows). Anyway, the VM took <br>> more time to shut down than the batteries could support. So half-way <br>> through, we withdrew one node and powered it off to shed load and gain <br>> battery runtime. This kind of logic can not reasonably be coded into a <br>> script.<br><br>Enlightening tale!<br><br>Thinking of it: I suppose that more VM-intensive needs (VDI etc.) would qualify for VM-specific<br>HA solutions (like oVirt/OpenStack) where VMs could be treated totally as physical<br>machines (install UPS agents on the guest OS and let them go); on a "classic" HA clustering<br>solution instead, I suppose that VMs should be server VMs (or treated like that) and<br>even Windows admins would know multiple ways (interactive, GPO, registry) to ensure<br>controlled behaviour of updates installation (tipically "interactive installation during a maintenance<br>window"). Leaving "install by default on shutdown" on does not speak well for those admins ;><br><br>> My $0.02.<br>> <br>> -- <br>> Digimer<br><br>Many thanks for your suggestions and shared experiences!<br><br>Regards,<br>Giuseppe<br><br></div> </div></body>
</html>