[Mesa-dev] [PATCH] src: replace RTLD_NOW with RTLD_LAZY

Rob Clark robdclark at gmail.com
Sat Aug 6 13:58:20 UTC 2016


On Sat, Aug 6, 2016 at 3:01 AM, Eric Anholt <eric at anholt.net> wrote:
> Rob Clark <robdclark at gmail.com> writes:
>
>> On Fri, Aug 5, 2016 at 8:42 PM, Jan Ziak <0xe2.0x9a.0x9b at gmail.com> wrote:
>>> Mesa source code prior to this patch uses both RTLD_NOW and RTLD_LAZY.
>>> This patch removes all RTLD_NOW in favor of RTLD_LAZY.
>>>
>>> In comparison to early binding, lazy binding reduces CPU instruction count
>>> of small GL apps (e.g: glxinfo) by 6 million instructions.
>>> Larger apps won't notice the difference.
>>
>> tbh, I don't know the background of existing places that use RTLD_LAZY
>> instead of RTLD_NOW (but my experience w/ xserver using LAZY has not
>> been positive, so I think going the other direction seems like a good
>> idea).. But I'm not sure that optimizing for glxinfo is the best goal.
>> I know that at least for freedreno a lot of the startup time for small
>> real gl apps (ie. something that mostly matters for piglit runs) goes
>> to constructing regalloc interference graph..  maybe there is some way
>> to leverage what is being done for on-disk shader cache to cache some
>> of this up-front work and make a meaningful reduction in startup cost
>> for things that actually do a bit more than glxinfo.  (Plus speeding
>> up piglit runs is actually a real world benefit..)
>
> I do think that RTLD_LAZY makes sense, and there's no reason to waste
> the CPU time if we don't need it.  If nothing else, we all run a lot of
> piglit processes that all create contexts.  As far as "what if there are
> unresolved symbols or something?", I think if we have symbols not being
> covered by piglit even once, we've already lost.

well, for something like shader_runner, I wonder if there is some way
to tell what % of symbols actually get resolved?  Maybe it is lower
than I was expecting.

> For your regalloc, have you looked at i965's direct q value calculation
> in brw_fs_reg_allocate.cpp?  That might save you a ton of time.  That
> said, I was skimming a paper recently that seemed to be saying that if
> you can assume a not-completely-general set of register classes, you can
> do the equivalent of the pq test without the giant table.

I do actually compute the q values, like i965.  I do have more regs
(but have restricted things to fewer classes).  Oh, and a bunch of
half-precision regs too, but fewer classes there since I need to use
full precision for args to texture sample instructions so that removes
a couple permutations.

Anyways, I haven't looked at it for a while, but probably just comes
down to overhead being more noticeable on slower devices ;-)

I wouldn't mind having a look at that paper if you can find it again.

BR,
-R


More information about the mesa-dev mailing list