[Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Digimer lists at alteeve.ca
Fri Jul 4 10:17:07 EDT 2014


On 04/07/14 02:16 PM, Giuseppe Ragusa wrote:
> Hi all,
> I'm trying to create a script as per subject (on CentOS 6.5,
> CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS
> monitored by NUT).
>
> Ideally I think that each node should stop (disable) all locally-running
> VirtualDomain resources (doing so cleanly demotes than downs the DRBD
> resources underneath), then put itself in standby and finally shutdown.
>
> On further startup, manual intervention would be required to unstandby
> all nodes and enable resources (nodes already in standby and resources
> already disabled before blackout should be manually distinguished).
>
> Is this strategy conceptually safe?
>
> Unfortunately, various searches have turned out no "prior art" :)

I started work on something similar with apcupsd (first I had to make it 
work with multiple UPSes, which I did). Then I decided not to actually 
implement, and decided instead to leave it up to an admin to decide 
how/when/if to initiate a graceful shutdown.

My rationale was that this placed way too much potential damage in the 
hands of, effectively, a single trigger. One bad bug and you could bring 
down a perfectly fine cluster.

Instead, what I did was ensure that any power event triggered an alert 
email (x2, as both nodes ran the monitoring app). This way, I (and the 
client's admins) would be notified immediately if anything happened. 
Then it was up to us to decide how/if to initiate a graceful shutdown.

One real-world example;

A couple months ago, a client's neighborhood was hit with a prolonged 
power outage. Eventually, we decided to gracefully shut down. However, 
one of the windows VMs had downloaded and prepped to install about 30 
updates (no idea how this happened, except windows). Anyway, the VM took 
more time to shut down than the batteries could support. So half-way 
through, we withdrew one node and powered it off to shed load and gain 
battery runtime. This kind of logic can not reasonably be coded into a 
script.

My $0.02.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Pacemaker mailing list