[ClusterLabs Developers] Proposed future feature: multiple notification scripts

Sun Dec 6 22:53:45 UTC 2015

> On 5 Dec 2015, at 3:32 AM, Jan Pokorný <jpokorny at redhat.com> wrote:
> 
> On 04/12/15 12:33 +1100, Andrew Beekhof wrote:
>>> On 4 Dec 2015, at 2:45 AM, Jan Pokorný <jpokorny at redhat.com> wrote:
>>> On 02/12/15 17:23 -0600, Ken Gaillot wrote:
>>>> This will be of interest to cluster front-end developers and anyone who
>>>> needs event notifications ...
>>>> 
>>>> One of the new features in Pacemaker 1.1.14 will be built-in
>>>> notifications of cluster events, as described by Andrew Beekhof on That
>>>> Cluster Guy blog:
>>>> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
>>>> 
>>>> For a future version, we're considering extending that to allow multiple
>>>> notification scripts, each with multiple recipients. This would require
>>>> a significant change in the CIB. Instead of a simple cluster property,
>>>> our current idea is a new configuration section in the CIB, probably
>>>> along these lines:
>>>> 
>>>> <configuration>
>>>>  <!-- usual crm_config etc. here -->
>>>> 
>>>>  <!-- this is the new section -->
>>>>  <notifications>
>>>> 
>>>>     <!-- each script would be in a notify element -->
>>>>     <notify id="notify-1" path="/my/script.sh" timeout="30s">
>>>> 
>>>>        <recipient id="recipient-1" value="me at example.com" />
>>>>        <!-- etc. for multiple recipients -->
>>>> 
>>>>     </notify>
>>>> 
>>>>     <!-- etc. for multiple scripts -->
>>>> 
>>>>  </notifications>
>>>> </configuration>
>>>> 
>>>> 
>>>> The recipient values would be passed to the script as command-line
>>>> arguments (ex. "/my/script.sh me at example.com").
>>> 
>>> Just thinking out loud, Pacemaker is well adapted to cope with
>>> asymmetric/heterogenous nodes (incl. user-assisted optimizations
>>> like with non-default "resource-discovery" property of a location
>>> contraint, for instance).
>>> 
>>> Setting notifications universally for all nodes may be desired
>>> in some scenarios, but may not be optimal if nodes may diverge,
>> 
>> Correct always wins over optimal.
>> 
>> I’d not be optimising around scripts that only apply to specific
>> resources that also don’t run everywhere - at most you waste a few
>> cycles.  If that ever becomes a real issue we can add a filter to
>> the notify block.
>> 
>> Far worse is if a service can run somewhere new and you forgot to
>> copy the script across… The knowledge doesn’t exist to report that
>> as a problem.
>> 
>> The common scenario will be feeding fencing events into things like
>> galera or nova and sending via different transports, like SNMP, SMS,
>> email.  Particularly sending SNMP alerts into a fully fledged
>> monitoring and alerts system that finds duplicates and does advanced
>> filtering.  We do not and should not be trying to reimplement that.
>> 
>>> or will for sure:
>>> 
>>> (1) the script may not be distributed across all the nodes
>> 
>> Thats a bug, not a feature.
> 
> see bellow
> 
>>>   - or (1b) it is located at the shared storage that will become
>>>     available later during cluster life cycle because it is
>>>     a subject of cluster service management as well
>> 
>> How will that script send a notification that the shared storage is
>> no longer available?
> 
> This was mostly based on (made up, yes) assumption that notification
> script is only checked once for the existence.  On the other hand,
> if not, periodic recheck won't be drastically different in complexity
> from period dir rescan (and optimizations on some systems do exist).

A rescan isn’t going to help you send the “i’ve just stopped the shared storage” notification.
There is nothing there to rescan.

> 
>>> (2) one intentionally wants to run the notification mechanism
>>>   on a subset of nodes
>> 
>> Can you explain to me when that would be a good idea?
> 
> I have no idea about nifty details about how it all should work, but
> it may be desired to, e.g., decide if the notification agent should
> run also in pacemaker_remote case or not.

They don’t. The alerts come from the node its connected to.

>  Or you want to run backup
> SMS notifications only at the nodes with GSM module installed.

Apart from sounding like a lot of work to avoid

  if [ ! -e /bin/sometool ]; exit 0; fi

It doesn’t make sense that receiving alerts would be so crucial that they’d configure redundant paths, but only do so on a subset of the nodes.
That would be like only configuring fencing for some of the nodes.

> 
>> Particularly when those nodes are the only remaining survivors
>> (which you can’t know isn’t the case).
>> If we don’t care about the services on those nodes, why did we make
>> them HA?
> 
> You can achieve good enough HA notification mechanism by using more
> non-HA notification methods, just as you do with fencing topologies,
> or just as HA cluster uses more nodes that are not HA by themselves.
> 
>>> Note also that once you have the responsibility to distribute the
>>> script on your own, you can use the same distribution mechanism to
>>> share your configuration for this script, as an alternative to using
>>> "value" attribute in the above proposal
>> 
>> So instead of using a standard pool of agents and pcs to set a
>> value, I get to maintain two sets of files on every node in the
>> cluster?
>> And this is supposed to be a feature?
> 
> Just wanted to remind that CIB solves just a subset of orchestration
> problems.

The CIB is a dumb data store, why are we talking about orchestration?
All we’re trying to do is get information out of pacemaker and into people’s alerting frameworks.
We’re not trying to re-invent those frameworks.

>  Tools like pcs adds only a tiny fraction to this subset.
> 
> Standard pool of agents + (mostly) single value customization via
> central place (CIB) sounds good, not discounting this at all.
> 
>>> (and again, this way, you
>>> are free to have an asymmetric configuration).  There are tons
>>> of cases like that and one has to deal with that already (some RAs,
>>> file with secret for Corosync, ...).
>>> 
>>> What I am up to is a proposal of an alternative/parallel mechanism
>>> that better fits the asymmetric (and asynchronous from cluster life
>>> cycle POV) use cases: old good drop-in files.  There would simply
>>> be a dedicated directory (say /usr/share/pacemaker/notify.d) where
>>> the software interested in notifications would craft it's own
>>> listener script (or a symlink thereof), script is then discovered
>>> by Pacemaker upon subsequent dir rescan or inotify event, done.
>>> 
>>> --> no configuration needed (or is external to the CIB, or is
>>>   interspersed in a non-invasive way there), install and go
>>> 
>>> --> it has local-only effect, equally as is local the installation
>>>   of the respective software utilizing notifications
>>>   (and as is local handling of the notifications!)
>> 
>> Still not a feature.
> 
> I am soliciting the feedback to learn more about the usefulness
> if you define feature := something useful.

Yes, I’m saying its neither.

> 
> -- 
> Jan (Poki)
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers