[ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jan 18 12:53:51 EST 2019

On Thu, Jan 17, 2019 at 09:09:11AM +1100, Andrew Beekhof wrote:
> > On 17 Jan 2019, at 2:59 am, Ken Gaillot <kgaillot at redhat.com> wrote:
> > 
> > I'm not familiar with the reasoning for the current setup, but
> > pacemaker's crm_crit(), crm_error(), etc. use qb_logt(), while
> > crm_debug() and crm_trace() (which won't be used in ordinary runs) do
> > something similar to what you propose.
> > 
> > Pacemaker has about 1,700 logging calls that would be affected (not
> > counting another 2,000 debug/trace). Presumably that means Pacemaker
> > currently has about +16KB of memory overhead and binary size for
> > debug/trace logging static pointers, and that would almost double using
> > them for all logs. Not a big deal today? Or meaningful in an embedded
> > context?
> > 
> > Not sure if that overhead vs runtime trade-off is the original
> > motivation or not, but that's the first thing that comes to mind.
> I believe my interest was the ability to turn them on dynamically in a
> running program (yes, i used it plenty back in the day) and have the
> overhead be minimal for the normal case when they weren't in use.

Also, with libqb before the commit mentioned in the subject (633f262)
and that is what pacemaker is using right now,
you'd get one huge static array of "struct callsites" (in a special
linker section; that's the linker magic that patch removes).

Note: the whole struct was statically allocated,
it is an array of structs, not just an array of pointers.

sizeof(struct qb_log_callsite) is 40

Now, those structs get dynamically allocated,
and put in some lineno based lookup hash.
(so already at least additional 16 bytes),
not counting malloc overhead for all the tiny objects.

The additional 8 byte static pointer
is certainly not "doubling" that overhead.

But can be used to skip the additional lookup,
sprintf, memcpy and whatnot, and even the function call,
if the callsite at hand is currently disabled,
which is probably the case for most >= trace
callsites most of the time.

Any volunteers to benchmark the cpu usage?
I think we'd need
(trace logging: {enabled, disabled})
x ({before 633f262,
    after 633f262,
    after 633f262 + lars patch})

I think without the "linker magic"
(static array of structs),
the _log_filter_apply() becomes a no-op?
That's qb_log_filter_ctl2() at runtime.
It would have to iterate over all the collision lists in all the buckets
of the dynamically allocated callsites, instead of iterating the
(now non-existing) static array of callsites.

One side-effect of not using a static pointer,
but *always* doing the lookup (qb_log_calsite_get()) again,
is that a potentially set _custom_filter_fn() would be called
and that filter applied to the callsite, at each invocation.

But I don't think that that is intentional?

"just saying" :-)



More information about the Developers mailing list