[Mesa-dev] Reducing get.c size (and get_es1.c and get_es2.c)

Thu May 6 13:54:53 PDT 2010

Hi,

Ok, I suppose this is not the most pressing issue in mesa, but I was
toying with an idea of how to reduce get.c size and integrate
get_es1.c and get_es2.c and I had to try it out.  Of course it ended
up being a bigger project and took a couple of days, but in the end I
think it turned out to be a worthwhile effort.  The result is the two
patches on the get-optimagix branch in my personal mesa repo:

  http://cgit.freedesktop.org/~krh/mesa/log/?h=get-optimagix

The basic idea is that most getters just look up an int somewhere in
GLcontext and then convert it to a bool or float according to which of
glGetIntegerv() glGetBooleanv() etc is being called.  Instead of
generating code to do this, we can just record the enum value and the
offset into GLcontext in an array of structs.  Then in glGet*(), we
lookup the struct for the enum in question, and use the offset to get
the int we need.

Of course, sometimes we need to look up a float, a boolean, a bit in a
bitfield, a matrix  or other types, so we need to track the type of
the value in GLcontext.  And sometimes the value isn't in GLcontext
but in the drawbuffer, the array object, current texture unit, or
maybe it's a computed value.  So we need to also track where or how to
find the value.  Finally, we sometimes need to check that one of a
number of extensions are enabled, the gl version or flush or call
_mesa_update_state().  This is done by attaching optional extra
information to the value description struct, it's sort of like an
array of opcodes that describe extra checks or actions.

Putting all this together we end up with struct value_desc in the
patch, and with a couple of macros to help, the table of struct
value_desc is about as concise as the specification in the python
code.

All we need now is a way to look up the value struct from the enum.
The code generated by gcc for the current generated big switch
statement is a big, balanced, open coded if/else tree (I'm giving gcc
the benefit of the doubt here, I didn't validate that the tree was
balanced).  It would be natural to sort the new enum table and use
bsearch(), but I decided to use a read-only hash table instead.
bsearch() has a nice guaranteed worst case performance, but we're also
guaranteed to hit that worst case (log2(n) iterations) for about half
the enums.  Instead, using a simple, direct hashing hash table, we can
find the enum on the first try for 80% of the enums, 1 collision for
10% and never more than 5 collisions for any enum (typical numbers).
And the code is very simple, even though it feels a little magic.

Benefits:

 1) Smaller. Much smaller.  Generated code is much bigger than the
corresponding data tables.  Looking at an i965 DRI driver with GLES1
and GLES2 APIs enabled we get:

[krh at hinata mesa]$ size lib/i965_dri*.so
   text	   data	    bss	    dec	    hex	filename
2658275	  29132	  61664	2749071	 29f28f	lib/i965_dri_old.so
2505275	  36980	  63712	2605967	 27c38f	lib/i965_dri.so

That is, a 140kb difference, or a 5% size reduction.  And since the
reduction is in libmesa.a, it applies to all DRI drivers, which adds
up to a nice space savings if you're to squeeze 14 DRI drivers onto a
live CD (looking at Fedoras mesa-dri-drivers RPM).

 2) Faster; the hash table will find the enum in zero to one
iterations most of the time and never more that five. Of course, this
is all academic, since glGet*() aren't typically in any kind of
hotpath, but it's nice to just verify that we're not replacing get.c
with something slower.

 2) No code-generation, the C file *is* the spec and is about as
concise as the python script was.

 3) A non-hacky glGetDoublev().  The current implementation calls
glGetFloatv() with a local variable array, which it fills with the
magic value -1234.5 to be able to determine how many values was
returned from glGetFloatv().  So if your matrix has an entry with the
value -1234.5 you're out of luck.

 4) A clean way to integrate get.c, get-es1.c and get-es2.c.  We can
initialize the hash table with the values that are valid for the API
we're initializing and use the same _mesa_Get*() entry points to
implement the glGet* functions for the different APIs.

Drawbacks:

 1) Uhm, regressions?  I went back and double checked the new get.c
against the enum list in get_gen.py after finishing the patch.  While
I didn't find any inconsistencies, it's a long list and I may have
overlooked something.  I'm running piglit on it now, but I suspect
I'll have to add a few testcases to hit the different code paths in
the new glGet*() implementation.

 2) More complex code (though if you consider the get-gen.py script,
it's probably about the same total complexity as the current
solution).

Let me know what you think about this - I'd like to merge it once I've
tested it a bit.

Kristian