[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 4
kgaillot at redhat.com
Sun Jun 12 10:50:23 EDT 2016
On 06/12/2016 07:28 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
>> With this release candidate, we now provide three sample alert scripts
>> to use with the new alerts feature, installed in the
>> /usr/share/pacemaker/alerts directory.
> Is there a real reason to name these scripts *.sample? Sure, they are
> samples, but they are also usable as-is, aren't they?
Almost as-is -- copy them somewhere, rename them without ".sample", and
mark them executable.
After some discussion, we decided that this feature is not mature enough
yet to provide the scripts for direct use. After we get some experience
with how users actually use the feature and the sample scripts, we can
gain more confidence in recommending them generally. Until then, we
recommend that people examine the script source and edit it to suit
their needs before using it.
That said, I think the SNMP script in particular is quite useful.
The log-to-file script is more a proof-of-concept that people can use as
a template. The SMTP script may be useful, but probably paired with some
custom software handling the recipient address, to avoid flooding a real
person's mailbox when a cluster is active.
>> The ./configure script has a new "--with-configdir" option.
> This greatly simplifies packaging, thanks much!
> Speaking about packaging: are the alert scripts run by remote Pacemaker
> nodes? I couldn't find described which nodes run the alert scripts.
> From the mailing list discussions I recall they are run by each node,
> but this would be useful to spell out in the documentation, I think.
Good point. Alert scripts are run only on cluster nodes, but they
include remote node events. I'll make sure the documentation mentions that.
> Similarly for the alert guarrantees: I recall there's no such thing, but
> one could also think they are parts of transactions, thus having recovery
> behavior similar to the resource operations. Hmm... wouldn't such
> design actually make sense?
We didn't want to make any cluster operation depend on alert script
success. The only thing we can guarantee is that the cluster will try to
call the alert script for each event. But if the system is going
haywire, for example, we may be unable to spawn a new process due to
some resource exhaustion, and of course the script itself may have problems.
Also, we wanted to minimize the script interface, and keep it
backward-compatible with crm_mon external scripts. We didn't want to add
an OCF-style layer of meta-data, actions and return codes, instead
keeping it as simple as possible for anyone writing one.
Since it's a brand new feature, we definitely want feedback on all
aspects once it's in actual use. If alert script failures turns out to
be a big issue, I could see maybe reporting them in cluster status (and
allowing that to be cleaned up).
More information about the Users