[Pacemaker] notifications for cloned resources

Thu Aug 14 15:49:44 EDT 2014

On Thu, Aug 14, 2014 at 12:38:00PM +1000, Andrew Beekhof wrote:
> 
> On 14 Aug 2014, at 12:33 am, Steve Feehan <feehans at ncbi.nlm.nih.gov> wrote:
> 

> Is it a problem that several seconds could go by between the node going offline and the notification arriving?
> I would usually expect the answer to be yes.

When a node is offline, all the VMs are down and will need to be
restarted.  It will take harep several minutes (at least) to get them
started. The quicker you start the better, but several seconds would
hardly make a difference.

> Those that do care (eg. cluster filesystems) usually have a daemon that a) monitors the corosync membership directly and/or b) subscribes to stonithd fencing notifications.
> They do this because they can't wait for resource based notification.

Is there an example of method a) or b) that I can use as a starting point?

> What is the usecase for nesting a ganeti cluster inside a pacemaker one?

I'm not really sure. It would certainly be easier if ganeti handled
marking the node offline. Its provided hooks for fencing and the harep
utility for healing the cluster. So its 90% of the way to a full HA
solution.

Maybe the ganeti folks don't want to reinvent the wheel. Or maybe they
don't want to own the decision of when to fence/offline a node. harep can
perform potentially dangererous actions. Depending on the configuration,
it can go as far as reinstalling VMs.

-- 
Steve Feehan [Contractor]