[ClusterLabs] Alert notes
wferi at niif.hu
Thu Jun 16 05:05:17 EDT 2016
Klaus Wenninger <kwenning at redhat.com> writes:
> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>> Please find some random notes about my adventures testing the new alert
>> The first alert example in the documentation has no recipient:
>> <alert id="my-alert" path="/path/to/my-script.sh" />
>> In the example above, the cluster will call my-script.sh for each
>> while the next section starts as:
>> Each alert may be configured with one or more recipients. The cluster
>> will call the agent separately for each recipient.
> The goal of the first example is to be as simple as possible.
> But of course it makes sense to mention that it is not compulsory
> to ad a recipient. And I guess it makes sense to point that out
> as it is just ugly to think that you have to fake a recipient while
> it wouldn't make any sense in your context.
>> I think the default timestamp should contain date and time zone
>> specification to make it unambigous.
> Idea was to have a trade-off between length and amount of information.
I don't think it's worth saving a couple of bytes by dropping this
information. In many cases there will be some way to recover it (from
SMTP headers or system logs), but that complicates things.
In a similar vein, keeping the sequence number around would simplify
alert ordering and loss detection on the receiver side. Especially with
SNMP, where the transport is unreliable as well.
>> (BTW I'd prefer to run the alert scripts as a different user than the
>> various Pacemaker components, but that would lead too far now.)
> well, something we thought about already and a point where the
> new feature breaks the ClusterMon-Interface.
> Unfortunately the impact is quite high - crmd has dropped privileges -
> but if the pain-level rises high enough ...
There's very little room to do this. You'd need to configure an alert
user and group, and store them in the saved uid/gid set before dropping
privileges for the crmd process. Or use a separate daemon for sending
alerts, which feels cleaner.
>> The SNMP agent seems to have a problem with hrSystemDate, which should
>> be an OCTETSTR with strict format, not some plain textual timestamp.
>> But I haven't really looked into this yet.
> Actually I had tried it with the snmptrap-tool coming with rhel-7.2
> and it worked with the string given in the example.
> Did you copy it 1-1? There is a typo in the document having the
> double-quotes double. The format is strict and there are actually
> 2 formats allowed - on with timezone and one without. The
> format string given should match the latter.
You are right. The snmptrap tool does the string->binary conversion if
it gets the correct format. Otherwise, if the length matches, is does a
plain cast to binary, interpreting for example 12:34:56.78 as
12594-58-51,52:58:53.54,.55:56. Looks like the sample SNMP alert agent
shouldn't let the uses choose any timestamp-format but
%Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
in the current design. Maybe it would be more appropriate to get the
timestamp from crmd as a high resolution (fractional) epoch all the
time, and do the string conversion in the agents as necessary. One
could still control the format via instance_attributes where allowed.
Or keep around the current mechanism as well to reduce code duplication
in the agents. Just some ideas...
More information about the Users