[ClusterLabs] Pacemaker alert framework

Fri Jul 6 10:20:41 EDT 2018

On Fri, 2018-07-06 at 15:58 +0200, Klaus Wenninger wrote:
> On 07/06/2018 03:41 PM, Ian Underhill wrote:
> > requirement:
> > when a resource fails perform an actionm, run a script on all nodes
> > within the cluster, before the resource is relocated. i.e.
> > information gathering why the resource failed.
>  
> Have in mind that trying to run a script on all nodes in the cluster
> before
> proceeding is a delicate issue because not being able to run it on
> one
> node might prevent relocation and thus availability.
> Of course this largely depends on how it is implemented - just wanted
> to raise attention.
>   
> > what I have looked into:
> > 1) Use the monitor call within the resource to SSH to all nodes,
> > again SSH config needed.
> > 2) Alert framework : this only seems to be triggered for nodes
> > involved in the relocation of the resource. i.e. if resource moves
> > from node1 to node 2 node 3 doesnt know. so back to the SSH
> > solution :(
>  
> Alerts are designed not to block anything (no other alerts as well)
> so the alert-agents are 
> just called on nodes that anyway already have to do with that very
> event.
> 
> > 3) sending a custom alert to all nodes in the cluster? is this
> > possible? not found a way?
> > 
> > only solution I have:
> > 1) use SSH within an alert monitor (stop) to SSH onto all nodes to
> > perform the action, the nodes could be configured using the alert
> > monitors recipients, but I would still need to config SSH users and
> > certs etc.
> >      1.a) this doesnt seem to be usable if the resource is
> > relocated back to the same node, as the alerts start\stop are run
> > at the "same time". i.e I need to delay the start till the SSH has
> > completed.
> > 
> > what I would like:
> > 1) delay the start\relocation of the resource until the information
> > from all nodes is complete, using only pacemaker behaviour\config
> > 
> > any ideas?
>  
> Not 100% sure of what what your exact intention is ...
> You could experiment with a clone that depends on the running
> instance and
> use the stop of that to trigger whatever you need.
> Not sure but I'd expect that pacemaker would tear down all clone
> instances
> before it relocates your resource.

Yes, that sounds like a good solution. See clone notifications:

http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pa
cemaker_Explained/index.html#_clone_resource_agent_requirements

You could even combine everything into a single custom resource agent
for use as a master/slave resource, where the master is the only
instance that actually runs the resource, and the slaves just act on
the notifications.

> 
> Regards,
> Klaus
> 
> > Thanks
> > 
> > /Ian.
-- 
Ken Gaillot <kgaillot at redhat.com>