[ClusterLabs] Alert notes

Ferenc Wágner wferi at niif.hu
Thu Jun 16 08:40:18 EDT 2016


Klaus Wenninger <kwenning at redhat.com> writes:

> On 06/16/2016 11:05 AM, Ferenc Wágner wrote:
>
>> Klaus Wenninger <kwenning at redhat.com> writes:
>>
>>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>>>
>>>> I think the default timestamp should contain date and time zone
>>>> specification to make it unambigous.
>>>
>>> Idea was to have a trade-off between length and amount of information.
>>
>> I don't think it's worth saving a couple of bytes by dropping this
>> information.  In many cases there will be some way to recover it (from
>> SMTP headers or system logs), but that complicates things.
>
> Wasn't about saving some bytes in the size of a file or so but
> rather to keep readability. If the timestamp fills your screen
> you won't be able to read the actual information...have a look
> at /var/log/messages...
> Pure intention was to have a default that creates a kind of nice-looking
> output together with the file-example to give people an impression
> what they could do with the feature.

I see.  Incidentally, the file example is probably the one which would
profit most of having full timestamps.  And some locking.

>> In a similar vein, keeping the sequence number around would simplify
>> alert ordering and loss detection on the receiver side.  Especially with
>> SNMP, where the transport is unreliable as well.
>
> Nice idea... any OID in mind?

No.  But you can always extend PACEMAKER-MIB.

> Unfortunately the sequence-number we have right now als environment-
> variable is not really fit for this purpuse. It counts up with each
> and every alert being sent on a single node. So if you have multiple
> alerts configured you would experience gaps that prevent you from
> using it as loss-detection.

I see, it isn't per alert, unfortunately.  Still better than nothing,
though...

>>>> (BTW I'd prefer to run the alert scripts as a different user than the
>>>> various Pacemaker components, but that would lead too far now.)
>>>
>>> well, something we thought about already and a point where the
>>> new feature breaks the ClusterMon-Interface.
>>> Unfortunately the impact is quite high - crmd has dropped privileges -
>>> but if the pain-level rises high enough ...
>>
>> There's very little room to do this.  You'd need to configure an alert
>> user and group, and store them in the saved uid/gid set before dropping
>> privileges for the crmd process.  Or use a separate daemon for sending
>> alerts, which feels cleaner.
>
> Yes 2nd daemon was the idea. We don't want to give more rights
> to crmd than it needs. Btw. the daemon is there already: lrmd ;-)

It's running as root already, so at least no problem changing to any
user.  And the default could be hacluster.

>> You are right.  The snmptrap tool does the string->binary conversion if
>> it gets the correct format.  Otherwise, if the length matches, is does a
>> plain cast to binary, interpreting for example 12:34:56.78 as
>> 12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
>> shouldn't let the uses choose any timestamp-format but
>> %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
>> in the current design. 
>
> Well, generic vs. failsafe  ;-)
> Of course one could introduce something like the metadata in RAs
> to achieve things like that but we wanted to keep the ball flat...
> After all the scripts are just examples...and the timestamp-format
> that should work is given in the header of the script...

More emphasis would help, I think.

>> Maybe it would be more appropriate to get the timestamp from crmd as
>> a high resolution (fractional) epoch all the time, and do the string
>> conversion in the agents as necessary.  One could still control the
>> format via instance_attributes where allowed.  Or keep around the
>> current mechanism as well to reduce code duplication in the agents.
>> Just some ideas...
>
> epoch was actually my first default ...
> additional epoch might be interesting alternative...

It would be useful.  Actually, crm_time_format_hr() currently fails for
any format string ending with any %-escape but N.  For example, "%Yx" is
formatted as "2016x", but "%Y" returns NULL.  You can avoid fixing this
by providing a fractional epoch instead. :)
-- 
Regards,
Feri




More information about the Users mailing list