[ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts

Fri Apr 22 13:42:37 EDT 2016

On 04/22/2016 02:43 AM, Klaus Wenninger wrote:
> On 04/22/2016 08:16 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 21.04.2016 um 19:50 in Nachricht
>> <571912F3.2060104 at redhat.com>:
>>
>> [...]
>>> The alerts section can have any number of alerts, which look like:
>>>
>>>    <alert id="alert-1"
>>>           path="/srv/pacemaker/pcmk_alert_sample.sh">
>>>
>>>       <recipient id="alert-1-recipient-1"
>>>                  value="/var/log/cluster-alerts.log" />
>>>
>>>    </alert>
>> Are there any parameters supplied for the script? For the XML: I think "path" for the script to execute is somewhat generic: Why not call it "exec" or something like that? Likewise for "value": Isn't "logfile" a better name?
> exec has a certain appeal...
> but recipient can actually be anything like email-address, logfile, ... so
> keeping it general like value makes sense in my mind
>>
>>> As always, id is simply a unique label for the entry. The path is an
>>> arbitrary file path to an alert script. Existing external scripts used
>>> with ClusterMon resources will work as alert scripts, because the
>>> interface is compatible.
>>>
>>> We intend to provide sample scripts in the extra/alerts source
>>> directory. The existing pcmk_notify_sample.sh script has been moved
>>> there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh.
>>>
>>> Each alert may have any number of recipients configured. These values
>> What I did not understand is how an "alert" is related to some cluster "event": By ID, or by some explict configuration?
> There are "node", "fencing" and "resource" (CRM_alert_kind tells you
> if you want to know inside a script) alerts and alerts was chosen
> as it is in sync with other frameworks like nagios, ... but you can choose
> it a synonym for event ... meaning it is not necessarily anything bad
> or good just something you might be interested in.
> 
> You get set a bunch of environment variables when your executable is
> called you can use to get more info and add intelligence if you like:
> 
> CRM_alert_node, CRM_alert_nodeid, CRM_alert_rsc, CRM_alert_task,
> CRM_alert_interval, CRM_alert_desc, CRM_alert_status,
> CRM_alert_target_rc, CRM_alert_rc, CRM_alert_kind,
> CRM_alert_version, CRM_alert_node_sequence
> CRM_alert_timestamp
> 
> Referencing is done via node-names, resource-ids as throughout
> the pacemaker-config in the cib.
> 
> 
>>
>>> will simply be passed to the script as arguments. The first recipient
>>> will also be passed as the CRM_alert_recipient environment variable, for
>>> compatibility with existing scripts that only support one recipient.
>>> (All CRM_alert_* variables will also be passed as CRM_notify_* for
>>> compatibility with existing ClusterMon scripts.)
>>>
>>> An alert may also have instance attributes and meta-attributes, for example:
>>>
>>>    <alert id="alert-1"
>>>           path="/srv/pacemaker/pcmk_alert_sample.sh">
>>>
>>>       <meta_attributes id="alert-1-meta">
>>>          <nvpair id="alert-1-timeout" name="timeout" value="10s" />
>>>       </meta_attributes>
>>>
>>>       <instance_attributes id="alert-1-vars">
>>>         <nvpair id="alert-1-vars-1" name="magic" value="1" />
>>>         <nvpair id="alert-1-vars-2" name="something" value="true" />
>>>       </instance_attributes>
>>>
>>>       <recipient id="alert-1-recipient-1"
>>>                  value="/var/log/cluster-alerts.log" />
>>>
>>>    </alert>
>>>
>>> The meta-attributes are optional properties used by the cluster.
>>> Currently, they include "timeout" (which defaults to 30s) and
>>> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a
>>> microsecond-resolution timestamp provided to the alert script as the
>>> CRM_alert_timestamp environment variable).
>>>
>>> The instance attributes are arbitrary values that will be passed as
>>> environment variables to the alert script. This provides you a
>>> convenient way to configure your scripts in the cluster, so you can
>>> easily reuse them.
>> At the moment this sounds quite abstract, yet.
> meta-attributes and instance-attributes as used as with
> resources, where meta-attributes reflect config-parameters
> you pass rather to pacemaker like in this case for the timeout
> observation when the script is executed, and the format
> string that tells pacemaker in which style you would like
> CRM_alert_timestamp to be filled.
> By the way this timestamp is created immediately before all alerts
> are fired off in parallel so to be usable for analysis of what happened
> in which order in the cluster - much better than using date inside
> a script running as separate process possibly having been delayed.
> 
> instance-attributes you can use to tell your script whatever
> you like but it is visible and synchronized throughout the
> cluster residing in the cib.

It is abstract, because instance attributes are interpreted by the alert
script you provide; pacemaker merely passes them along. It's comparable
to instance attributes for a resource -- pacemaker just passes them to
the resource agent.

A concrete example might be a script that emails somebody. It might take
the email address as the recipient, and the subject line as an instance
attribute. Maybe it could also take a time limit as an instance
attribute, and not send emails more often than that, to avoid filling up
someone's inbox when things go haywire.

Another example might be a script that pushes a status to a monitoring
system such as nagios. It might take the nagios server address as the
recipient, and the name of the nagios check being updated as an instance
attribute.

I expect as time goes on, we'll have a nice collection of ready-made
alert scripts for common use cases. But for now, you'll have to write
the alert scripts yourself.

>>> In the current implementation, meta-attributes and instance attributes
>>> may also be specified within the <recipient> block, in which case they
>>> override any values specified in the <alert> block when sent to that
>>> recipient. Whether this stays in the final 1.1.15 release or not depends
>>> on whether people find this to be useful, or confusing.
>> Could you give one complete example (configuration and script), even if it's just as a sample for discussion?
>>
>> ANd will the DTD version number be incremented this time? ;-)
> pcmk_alert_sample.sh is not a bad example for the use of the
> environment variables set per default - although at the moment
> it is still using the deprecated CRM_notify_... naming (instead of
> CRM_alert_...) which is still in for compatibility reasons with
> scripts made for the 2 predecessor implementations.
> 
> Another example for a config - also showing instance-attributes including
> overwriting them inside the recipient-section would be:
> 
> <configuration>
>   <alerts>
>     <alert id="notify_9"
> path="/usr/share/pacemaker/tests/pcmk_alert_sample.sh">
>       <meta_attributes id="meta_9">
>         <nvpair id="tstamp9" name="tstamp_format" value="%H:%M:%S.%06N"/>
>       </meta_attributes>
>       <instance_attributes id="global_vars_9">
>         <nvpair id="global_var9_1" name="variable1" value="1"/>
>         <nvpair id="global_var9_2" name="global2" value="1"/>
>       </instance_attributes>
>       <recipient id="recipient_9" value="/tmp/alerts.log">
>         <instance_attributes id="local_vars_9">
>           <nvpair id="local_var9_1" name="variable2" value="2"/>
>           <nvpair id="local_var9_2" name="global1" value="overwritten"/>
>         </instance_attributes>
>       </recipient>
>     </alert>
>   </alerts>
> </configuration>
> 
> To get a feeling you can add "set >> /tmp/set.txt" somewhere to
> pcmk_alert_sample.sh.
> But it is actually simple - just use them as environment-variables with
> the name you
> specified - without an prepending-tag or anything.
> 
> Yes the cib has version 2.5 now
> 
> The feature is gonna receive an update in "Pacemaker Explained"
> which I'm intending to have a maybe more snappy example as well.

"Pacemaker Explained" currently documents the ClusterMon resource for
this purpose; see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617349469888

That's the chapter that will be updated for the new approach. The
scripts will be compatible, so you can see the current documentation for
"Configuring Notifications via External-Agent" for a description of the
variables that the script will receive. For compatibility, each variable
will be provided as both CRM_notify_(whatever) and CRM_alert_(whatever).
New scripts should use the CRM_alert_* variables.

>>> Sometime during the 1.1.15 release cycle, the previous experimental
>>> interface (the notification-agent and notification-recipient cluster
>>> properties) will be disabled by default at compile-time. If you are
>>> compiling the master branch from source and require that interface, you
>>> can define RHEL7_COMPAT when building, to enable support.
>>>
>>> This feature is already in the upstream master branch, and will be in
>>> the forthcoming 1.1.15-rc1 release candidate. Everyone is encouraged to
>>> try it out and give feedback.
>>
>> Regards,
>> Ulrich