[ClusterLabs] Coming in 1.1.15: Event-driven alerts

Klaus Wenninger kwenning at redhat.com
Mon May 9 14:07:31 UTC 2016


Preparing the updated Pacemaker-Explained-section I read through
this again. I guess most points where this differs from the actual
implementation - currently in the repo - were discussed already but
where I remember them I'll insert them here anyway to have them
in one place.
Reason why I wrote this ahead of committing the updated
Pacemaker-Explained-section is that I think nobody by now
has spotted that the default-behavior for the timestamp-format
is not as written in this announcement. See below ...

On 04/21/2016 07:50 PM, Ken Gaillot wrote:
> Hello everybody,
>
> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
>
> The most prominent feature will be Klaus Wenninger's new implementation
> of event-driven alerts -- the ability to call scripts whenever
> interesting events occur (nodes joining/leaving, resources
> starting/stopping, etc.).
>
> This is the improved successor to both the ClusterMon resource agent and
> the experimental "notification-agent" feature that has been in the
> upstream master branch.
>
> The new feature was renamed to "alerts" to avoid confusion with the
> unrelated "notify" resource action.
>
> High-level tools such as crm and pcs should eventually provide an easy
> way to configure this, but at the XML level, the cluster configuration
> may now contain an alerts section:
>
>    <configuration>
>       ...
>       <alerts>
>          ...
>       </alerts>
>    </configuration>
>
> The alerts section can have any number of alerts, which look like:
>
>    <alert id="alert-1"
>           path="/srv/pacemaker/pcmk_alert_sample.sh">
>
>       <recipient id="alert-1-recipient-1"
>                  value="/var/log/cluster-alerts.log" />
>
>    </alert>
>
> As always, id is simply a unique label for the entry. The path is an
> arbitrary file path to an alert script. Existing external scripts used
> with ClusterMon resources will work as alert scripts, because the
> interface is compatible.
>
> We intend to provide sample scripts in the extra/alerts source
> directory. The existing pcmk_notify_sample.sh script has been moved
> there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh.
>
> Each alert may have any number of recipients configured. These values
> will simply be passed to the script as arguments. The first recipient
As already picked up by lmb and discussed here - just for completeness
once again - this wouldn't be the best idea to expose the recipients
by using them as arguments. And it wouldn't be compatible with
the predecessor implementations.
So the recipient is passed as environment variable (CRM_alert_recipient and
for compatibility reasons as well as CRM_notify_recipient).
> will also be passed as the CRM_alert_recipient environment variable, for
> compatibility with existing scripts that only support one recipient.
If one alert-section has more than 1 recipient section the script is called
multiple times for each of the recipients.
As the recipient-section is not compulsory you can as well leave it out
and the environment-variable, which would otherwise hold the recipient,
would then be empty.

For cases where the recipient e.g. is actually rather a log-file you
could then
add an instance-attribute called "log_file" if you like that better.

So with the current implementation, if you want multiple recipients handled
by a script in one go, you would - at the moment - just have one
recipient-section
with the actual recipients being e.g. comma-separated.
Of course you then have a rougher granularity of observation by pacemaker
because you just have one timeout for all recipients to be contacted,
instead
of one - at the moment even individually configurable - if you like that
- per
recipient.
If your script is not able to handle multiple instances being executed in
parallel - for whatever reason - you would go this way as well at the
moment.
> (All CRM_alert_* variables will also be passed as CRM_notify_* for
> compatibility with existing ClusterMon scripts.)
>
> An alert may also have instance attributes and meta-attributes, for example:
>
>    <alert id="alert-1"
>           path="/srv/pacemaker/pcmk_alert_sample.sh">
>
>       <meta_attributes id="alert-1-meta">
>          <nvpair id="alert-1-timeout" name="timeout" value="10s" />
>       </meta_attributes>
>
>       <instance_attributes id="alert-1-vars">
>         <nvpair id="alert-1-vars-1" name="magic" value="1" />
>         <nvpair id="alert-1-vars-2" name="something" value="true" />
>       </instance_attributes>
>
>       <recipient id="alert-1-recipient-1"
>                  value="/var/log/cluster-alerts.log" />
>
>    </alert>
>
> The meta-attributes are optional properties used by the cluster.
> Currently, they include "timeout" (which defaults to 30s) and
As it is more consistent with other meta-attributes used elsewhere
and as it was requested on the list out of this reason "tstamp_format"
is meanwhile renamed to "timestamp-format" in the HEAD of the 1.1-branch.
> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a
Actually I had intended to default it to seconds since epoch ("%i")
but checking again what I have implemented it wouldn't set the
environment-variable at all per default.
But I guess "%H:%M:%S.%06N" makes sense which is why I just
altered the behavior for this to be the default for the next iteration.
> microsecond-resolution timestamp provided to the alert script as the
> CRM_alert_timestamp environment variable).
>
> The instance attributes are arbitrary values that will be passed as
> environment variables to the alert script. This provides you a
> convenient way to configure your scripts in the cluster, so you can
> easily reuse them.
>
> In the current implementation, meta-attributes and instance attributes
> may also be specified within the <recipient> block, in which case they
> override any values specified in the <alert> block when sent to that
> recipient. Whether this stays in the final 1.1.15 release or not depends
> on whether people find this to be useful, or confusing.
As already spotted by sbdy on the list this paragraph just makes
sense if the scripts are called for each recipient, as pointed out
above.
>
> Sometime during the 1.1.15 release cycle, the previous experimental
> interface (the notification-agent and notification-recipient cluster
> properties) will be disabled by default at compile-time. If you are
> compiling the master branch from source and require that interface, you
> can define RHEL7_COMPAT when building, to enable support.
>
> This feature is already in the upstream master branch, and will be in
> the forthcoming 1.1.15-rc1 release candidate. Everyone is encouraged to
> try it out and give feedback.
Regards,
Klaus




More information about the Users mailing list