[ClusterLabs] Coming in 1.1.15: Event-driven alerts

Lars Marowsky-Bree lmb at suse.com
Thu Apr 28 15:24:21 UTC 2016


On 2016-04-27T12:10:10, Klaus Wenninger <kwenning at redhat.com> wrote:

> > Having things in ARGV[] is always risky due to them being exposed more
> > easily via ps. Environment variables or stdin appear better.
> What made you assume the recipient is being passed as argument?
> 
> The environment variable CRM_alert_recipient is being used to pass it.

Ah, excellent! But what made me think that this would be passed as
arguments is that your announcement said: "Each alert may have any
number of recipients configured. These values will simply be passed to
the script as *arguments*." ;-)

Thanks for clarifying this.

> > What I also miss is the ability to filter the events (at least
> > coarsely?) sent to a specific alert/recipient, and to constraint on
> > which nodes it will get executed.  Is that going to happen? On a busy
> > cluster, this could easily cause significant load otherwise.
> I'm aware of that and in the light of reducing complexity of the
> scripts / being able to use a generic scripts coming from anywhere
> it sounds reasonable as well as out of load considerations.
> 
> I was planning to see if I could come up with something easy
> and catchy maybe exploiting the already existing possibility
> to define rules via the nvpair-construct already used.
> 
> Intention of this first release was to have something that can
> replace the existing mechanisms in a smooth way in 1.1.15
> and to get feedback on that.

Makes sense, thanks. Just curious as to where you saw this going.

I'm still confused a little as to how I'd control on which node this
would get run. All, or is it always the DC?

> > It's also worth pointing out that this could likely "lose" events during
> > fail-overs, DC crashes, etc. Users probably should not strictly rely on
> > seeing *every* alert in their scripts, so this should be carefully
> > documented to not be considered a transactional, reliable message bus.
> Proper documentation is anyway still missing.
> Thanks for that input.

Thanks, I didn't mean to complain about this. This was actually
triggered by a recent experience "elsewhere" where someone tried to
build a reliable system on top of such notifications - and then some
were getting lost due to timing ... Best to immediately clarify what the
guarantees on this are ;-)



-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Users mailing list