[ClusterLabs] Coming in 1.1.15: Event-driven alerts

Wed Apr 27 06:10:10 EDT 2016

On 04/25/2016 01:00 PM, Lars Marowsky-Bree wrote:
> On 2016-04-21T12:50:43, Ken Gaillot <kgaillot at redhat.com> wrote:
>
> Hi all,
>
> awesome to see such a cool new feature land! I do have some
> questions/feedback though.
>
>> The alerts section can have any number of alerts, which look like:
>>
>>    <alert id="alert-1"
>>           path="/srv/pacemaker/pcmk_alert_sample.sh">
>>
>>       <recipient id="alert-1-recipient-1"
>>                  value="/var/log/cluster-alerts.log" />
>>
>>    </alert>
> So, there's one bit of this I dislike - instance_attributes get passed
> via the environment (as always), but the "value" ends up on the
> command-line in ARGV[]? Why?
>
> Wouldn't it make more sense to have an alert-wide instance_attribute
> section within <alert>, that could be overridden on a per-recipient
> basis if needed? And drop the value entirely?
>
> Having things in ARGV[] is always risky due to them being exposed more
> easily via ps. Environment variables or stdin appear better.
What made you assume the recipient is being passed as argument?

The environment variable CRM_alert_recipient is being used to pass it.

At the moment you have alert-wide instance_attributes that can
be overridden per-recipient - as you mention - although we were
discussing to leave out the overriding because it might create more
confusion than it does good.

Anyway the idea was that there would always be something like
a recipient/target or alike for an alert so to make value an obligatory
attribute there. Other idea behind it is to stay compatible with
scripts written for ClusterMon.

Of course we could prepend all instance_attributes CRM_alert & CRM_notify
when mapping them to environment variables. Then we could have
recipient there as instance_attribute that could be overridden if
somebody intended to use CRM_alert/notify_recipient or completely
neglected if he rather would have e.g. CRM_alert/notify_logfile.

Without the prepending one would have to overwrite
CRM_alert,notify_recipient
directly which I consider a little bit ugly.
On the other hand naming the environment variables freely
opens up the possibility to use any preexisting script which is not
necessarily made
for pacemaker-alerts, define a proxy, or do 1000 other things I might or
might
not think of at the moment.
>
> What I also miss is the ability to filter the events (at least
> coarsely?) sent to a specific alert/recipient, and to constraint on
> which nodes it will get executed.  Is that going to happen? On a busy
> cluster, this could easily cause significant load otherwise.
I'm aware of that and in the light of reducing complexity of the
scripts / being able to use a generic scripts coming from anywhere
it sounds reasonable as well as out of load considerations.

I was planning to see if I could come up with something easy
and catchy maybe exploiting the already existing possibility
to define rules via the nvpair-construct already used.

Intention of this first release was to have something that can
replace the existing mechanisms in a smooth way in 1.1.15
and to get feedback on that.
>
> It's also worth pointing out that this could likely "lose" events during
> fail-overs, DC crashes, etc. Users probably should not strictly rely on
> seeing *every* alert in their scripts, so this should be carefully
> documented to not be considered a transactional, reliable message bus.
Proper documentation is anyway still missing.
Thanks for that input...
>
> Regards,
>     Lars
>
Regards,
Klaus