[ClusterLabs] Alert notes

Ferenc Wágner wferi at niif.hu
Thu Jun 16 09:05:17 UTC 2016


Klaus Wenninger <kwenning at redhat.com> writes:

> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>
>> Please find some random notes about my adventures testing the new alert
>> system.
>>
>> The first alert example in the documentation has no recipient:
>>
>>     <alert id="my-alert" path="/path/to/my-script.sh" />
>>
>>     In the example above, the cluster will call my-script.sh for each
>>     event.
>>
>> while the next section starts as:
>>
>>     Each alert may be configured with one or more recipients. The cluster
>>     will call the agent separately for each recipient.
>
> The goal of the first example is to be as simple as possible.
> But of course it makes sense to mention that it is not compulsory
> to ad a recipient. And I guess it makes sense to point that out
> as it is just ugly to think that you have to fake a recipient while
> it wouldn't make any sense in your context.

I agree.

>> I think the default timestamp should contain date and time zone
>> specification to make it unambigous.
>
> Idea was to have a trade-off between length and amount of information.

I don't think it's worth saving a couple of bytes by dropping this
information.  In many cases there will be some way to recover it (from
SMTP headers or system logs), but that complicates things.

In a similar vein, keeping the sequence number around would simplify
alert ordering and loss detection on the receiver side.  Especially with
SNMP, where the transport is unreliable as well.

>> (BTW I'd prefer to run the alert scripts as a different user than the
>> various Pacemaker components, but that would lead too far now.)
>
> well, something we thought about already and a point where the
> new feature breaks the ClusterMon-Interface.
> Unfortunately the impact is quite high - crmd has dropped privileges -
> but if the pain-level rises high enough ...

There's very little room to do this.  You'd need to configure an alert
user and group, and store them in the saved uid/gid set before dropping
privileges for the crmd process.  Or use a separate daemon for sending
alerts, which feels cleaner.

>> The SNMP agent seems to have a problem with hrSystemDate, which should
>> be an OCTETSTR with strict format, not some plain textual timestamp.
>> But I haven't really looked into this yet.
>
> Actually I had tried it with the snmptrap-tool coming with rhel-7.2
> and it worked with the string given in the example.
> Did you copy it 1-1? There is a typo in the document having the
> double-quotes double. The format is strict and there are actually
> 2 formats allowed - on with timezone and one without. The
> format string given should match the latter.

You are right.  The snmptrap tool does the string->binary conversion if
it gets the correct format.  Otherwise, if the length matches, is does a
plain cast to binary, interpreting for example 12:34:56.78 as
12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
shouldn't let the uses choose any timestamp-format but
%Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
in the current design.  Maybe it would be more appropriate to get the
timestamp from crmd as a high resolution (fractional) epoch all the
time, and do the string conversion in the agents as necessary.  One
could still control the format via instance_attributes where allowed.
Or keep around the current mechanism as well to reduce code duplication
in the agents.  Just some ideas...
-- 
Regards,
Feri




More information about the Users mailing list