[ClusterLabs] Coming in 1.1.15: Event-driven alerts

Thu Apr 21 23:09:39 UTC 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> Hello everybody,
> 
> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
> 
> The most prominent feature will be Klaus Wenninger's new implementation
> of event-driven alerts -- the ability to call scripts whenever
> interesting events occur (nodes joining/leaving, resources
> starting/stopping, etc.).

Ooh, that sounds cool!  Can it call scripts after fencing has
completed?  And how is it determined which node the script runs on,
and can that be limited via constraints or similar?

I'm wondering if it could replace the current fencing_topology hack we
use to invoke fence_compute which starts the workflow for recovering
VMs off dead OpenStack nova-compute nodes.

Although even if that's possible, maybe there are good reasons to stay
with the fencing_topology approach?

Within the same OpenStack compute node HA scenario, it strikes me that
this could be used to invoke "nova service-disable" when the
nova-compute service crashes on a compute node and then fails to
restart.  This would eliminate the window in between the crash and the
nova server timing out the nova-compute service - during which it
would otherwise be possible for nova-scheduler to attempt to schedule
new VMs on the compute node with the crashed nova-compute service.

IIUC, this is one area where masakari is currently more sophisticated
than the approach based on OCF RAs:

https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md#evacuation-patterns

Does that make sense?