[Mesa-dev] Reducing get.c size (and get_es1.c and get_es2.c)

Thu May 6 14:45:09 PDT 2010

Kristian Høgsberg wrote:
> Hi,
> 
> Ok, I suppose this is not the most pressing issue in mesa, but I was
> toying with an idea of how to reduce get.c size and integrate
> get_es1.c and get_es2.c and I had to try it out.  Of course it ended
> up being a bigger project and took a couple of days, but in the end I
> think it turned out to be a worthwhile effort.  The result is the two
> patches on the get-optimagix branch in my personal mesa repo:
> 
>   http://cgit.freedesktop.org/~krh/mesa/log/?h=get-optimagix
> 
> The basic idea is that most getters just look up an int somewhere in
> GLcontext and then convert it to a bool or float according to which of
> glGetIntegerv() glGetBooleanv() etc is being called.  Instead of
> generating code to do this, we can just record the enum value and the
> offset into GLcontext in an array of structs.  Then in glGet*(), we
> lookup the struct for the enum in question, and use the offset to get
> the int we need.
> 
> Of course, sometimes we need to look up a float, a boolean, a bit in a
> bitfield, a matrix  or other types, so we need to track the type of
> the value in GLcontext.  And sometimes the value isn't in GLcontext
> but in the drawbuffer, the array object, current texture unit, or
> maybe it's a computed value.  So we need to also track where or how to
> find the value.  Finally, we sometimes need to check that one of a
> number of extensions are enabled, the gl version or flush or call
> _mesa_update_state().  This is done by attaching optional extra
> information to the value description struct, it's sort of like an
> array of opcodes that describe extra checks or actions.
> 
> Putting all this together we end up with struct value_desc in the
> patch, and with a couple of macros to help, the table of struct
> value_desc is about as concise as the specification in the python
> code.
> 
> All we need now is a way to look up the value struct from the enum.
> The code generated by gcc for the current generated big switch
> statement is a big, balanced, open coded if/else tree (I'm giving gcc
> the benefit of the doubt here, I didn't validate that the tree was
> balanced).  It would be natural to sort the new enum table and use
> bsearch(), but I decided to use a read-only hash table instead.
> bsearch() has a nice guaranteed worst case performance, but we're also
> guaranteed to hit that worst case (log2(n) iterations) for about half
> the enums.  Instead, using a simple, direct hashing hash table, we can
> find the enum on the first try for 80% of the enums, 1 collision for
> 10% and never more than 5 collisions for any enum (typical numbers).
> And the code is very simple, even though it feels a little magic.
> 
> Benefits:
> 
>  1) Smaller. Much smaller.  Generated code is much bigger than the
> corresponding data tables.  Looking at an i965 DRI driver with GLES1
> and GLES2 APIs enabled we get:
> 
> [krh at hinata mesa]$ size lib/i965_dri*.so
>    text	   data	    bss	    dec	    hex	filename
> 2658275	  29132	  61664	2749071	 29f28f	lib/i965_dri_old.so
> 2505275	  36980	  63712	2605967	 27c38f	lib/i965_dri.so
> 
> That is, a 140kb difference, or a 5% size reduction.  And since the
> reduction is in libmesa.a, it applies to all DRI drivers, which adds
> up to a nice space savings if you're to squeeze 14 DRI drivers onto a
> live CD (looking at Fedoras mesa-dri-drivers RPM).

The size savings is probably even greater when compared to what's in 
Mesa 7.8 since I consolidated some error handling code in master a few 
weeks ago.

>  2) Faster; the hash table will find the enum in zero to one
> iterations most of the time and never more that five. Of course, this
> is all academic, since glGet*() aren't typically in any kind of
> hotpath, but it's nice to just verify that we're not replacing get.c
> with something slower.
> 
>  2) No code-generation, the C file *is* the spec and is about as
> concise as the python script was.
> 
>  3) A non-hacky glGetDoublev().  The current implementation calls
> glGetFloatv() with a local variable array, which it fills with the
> magic value -1234.5 to be able to determine how many values was
> returned from glGetFloatv().  So if your matrix has an entry with the
> value -1234.5 you're out of luck.
> 
>  4) A clean way to integrate get.c, get-es1.c and get-es2.c.  We can
> initialize the hash table with the values that are valid for the API
> we're initializing and use the same _mesa_Get*() entry points to
> implement the glGet* functions for the different APIs.
> 
> Drawbacks:
> 
>  1) Uhm, regressions?  I went back and double checked the new get.c
> against the enum list in get_gen.py after finishing the patch.  While
> I didn't find any inconsistencies, it's a long list and I may have
> overlooked something.  I'm running piglit on it now, but I suspect
> I'll have to add a few testcases to hit the different code paths in
> the new glGet*() implementation.
> 
>  2) More complex code (though if you consider the get-gen.py script,
> it's probably about the same total complexity as the current
> solution).
> 
> Let me know what you think about this - I'd like to merge it once I've
> tested it a bit.

Sounds good.  If you could send me the new get.[ch] files I could do a 
bit more testing here.

-Brian