[Mesa-dev] Reducing get.c size (and get_es1.c and get_es2.c)

Thu May 6 15:27:39 PDT 2010

2010/5/6 Brian Paul <brianp at vmware.com>:
> Kristian Høgsberg wrote:
>>
>> Hi,
>>
>> Ok, I suppose this is not the most pressing issue in mesa, but I was
>> toying with an idea of how to reduce get.c size and integrate
>> get_es1.c and get_es2.c and I had to try it out.  Of course it ended
>> up being a bigger project and took a couple of days, but in the end I
>> think it turned out to be a worthwhile effort.  The result is the two
>> patches on the get-optimagix branch in my personal mesa repo:
>>
>>  http://cgit.freedesktop.org/~krh/mesa/log/?h=get-optimagix
>>
>> The basic idea is that most getters just look up an int somewhere in
>> GLcontext and then convert it to a bool or float according to which of
>> glGetIntegerv() glGetBooleanv() etc is being called.  Instead of
>> generating code to do this, we can just record the enum value and the
>> offset into GLcontext in an array of structs.  Then in glGet*(), we
>> lookup the struct for the enum in question, and use the offset to get
>> the int we need.
>>
>> Of course, sometimes we need to look up a float, a boolean, a bit in a
>> bitfield, a matrix  or other types, so we need to track the type of
>> the value in GLcontext.  And sometimes the value isn't in GLcontext
>> but in the drawbuffer, the array object, current texture unit, or
>> maybe it's a computed value.  So we need to also track where or how to
>> find the value.  Finally, we sometimes need to check that one of a
>> number of extensions are enabled, the gl version or flush or call
>> _mesa_update_state().  This is done by attaching optional extra
>> information to the value description struct, it's sort of like an
>> array of opcodes that describe extra checks or actions.
>>
>> Putting all this together we end up with struct value_desc in the
>> patch, and with a couple of macros to help, the table of struct
>> value_desc is about as concise as the specification in the python
>> code.
>>
>> All we need now is a way to look up the value struct from the enum.
>> The code generated by gcc for the current generated big switch
>> statement is a big, balanced, open coded if/else tree (I'm giving gcc
>> the benefit of the doubt here, I didn't validate that the tree was
>> balanced).  It would be natural to sort the new enum table and use
>> bsearch(), but I decided to use a read-only hash table instead.
>> bsearch() has a nice guaranteed worst case performance, but we're also
>> guaranteed to hit that worst case (log2(n) iterations) for about half
>> the enums.  Instead, using a simple, direct hashing hash table, we can
>> find the enum on the first try for 80% of the enums, 1 collision for
>> 10% and never more than 5 collisions for any enum (typical numbers).
>> And the code is very simple, even though it feels a little magic.
>>
>> Benefits:
>>
>>  1) Smaller. Much smaller.  Generated code is much bigger than the
>> corresponding data tables.  Looking at an i965 DRI driver with GLES1
>> and GLES2 APIs enabled we get:
>>
>> [krh at hinata mesa]$ size lib/i965_dri*.so
>>   text    data     bss     dec     hex filename
>> 2658275   29132   61664 2749071  29f28f lib/i965_dri_old.so
>> 2505275   36980   63712 2605967  27c38f lib/i965_dri.so
>>
>> That is, a 140kb difference, or a 5% size reduction.  And since the
>> reduction is in libmesa.a, it applies to all DRI drivers, which adds
>> up to a nice space savings if you're to squeeze 14 DRI drivers onto a
>> live CD (looking at Fedoras mesa-dri-drivers RPM).
>
> The size savings is probably even greater when compared to what's in Mesa
> 7.8 since I consolidated some error handling code in master a few weeks ago.
>
>
>>  2) Faster; the hash table will find the enum in zero to one
>> iterations most of the time and never more that five. Of course, this
>> is all academic, since glGet*() aren't typically in any kind of
>> hotpath, but it's nice to just verify that we're not replacing get.c
>> with something slower.
>>
>>  2) No code-generation, the C file *is* the spec and is about as
>> concise as the python script was.
>>
>>  3) A non-hacky glGetDoublev().  The current implementation calls
>> glGetFloatv() with a local variable array, which it fills with the
>> magic value -1234.5 to be able to determine how many values was
>> returned from glGetFloatv().  So if your matrix has an entry with the
>> value -1234.5 you're out of luck.
>>
>>  4) A clean way to integrate get.c, get-es1.c and get-es2.c.  We can
>> initialize the hash table with the values that are valid for the API
>> we're initializing and use the same _mesa_Get*() entry points to
>> implement the glGet* functions for the different APIs.
>>
>> Drawbacks:
>>
>>  1) Uhm, regressions?  I went back and double checked the new get.c
>> against the enum list in get_gen.py after finishing the patch.  While
>> I didn't find any inconsistencies, it's a long list and I may have
>> overlooked something.  I'm running piglit on it now, but I suspect
>> I'll have to add a few testcases to hit the different code paths in
>> the new glGet*() implementation.
>>
>>  2) More complex code (though if you consider the get-gen.py script,
>> it's probably about the same total complexity as the current
>> solution).
>>
>> Let me know what you think about this - I'd like to merge it once I've
>> tested it a bit.
>
> Sounds good.  If you could send me the new get.[ch] files I could do a bit
> more testing here.

I can send it to you if you really want, but there are a few changes
in other files (for exampel, to initialize the hash table in
one_time_init()).  It's probably easier if you just pull the branch
from my git tree.  The fastest way to do this is something like:

[krh at hinata mesa-7.8]$ git fetch
git://people.freedesktop.org/~krh/mesa get-optimagix
[krh at hinata mesa-7.8]$ git checkout -b test FETCH_HEAD

from a mesa checkout, which will create a new branch called test where
you can take a look at the new get.c.

Kristian