[Mesa-dev] [PATCH 3/4] glsl: add ir_cache class and functions for shader serialization

Mon Oct 28 21:09:18 CET 2013

Before we start, I have a clarifying question: Is the binary format
intended to be the same for 32- and 64-bit builds, or different?  At least
one of my comments below will depend on the answer to this question.

On 24 October 2013 01:28, Tapani Pälli <tapani.palli at intel.com> wrote:

> +/* cache specific debug output */
> +#ifdef SHADER_CACHE_DEBUG
> +#define CACHE_DEBUG(fmt, args...) printf(fmt, ## args)
> +#else
> +#define CACHE_DEBUG(fmt, args...) do {} while (0)
> +#endif
>

This is the gcc-specific variadic argument syntax.  To be compatible with
MSVC, we have to use the C99 syntax, which is:

#ifdef SHADER_CACHE_DEBUG
#define CACHE_DEBUG(fmt, ...) printf(fmt, ## __VA_ARGS__)
#else
#define CACHE_DEBUG(fmt, ...) do {} while (0)
#endif

> +
> +/* C API for the cache */
> +#ifdef __cplusplus
> +#define _EXTC extern "C"
> +#else
> +#define _EXTC
> +#endif
> +
> +_EXTC char *_mesa_shader_serialize(struct gl_shader *shader,
> +   struct _mesa_glsl_parse_state *state,
> +   const char *mesa_sha, size_t *size);
> +
> +_EXTC struct gl_shader *_mesa_shader_unserialize(void *mem_ctx,
> +   void *blob, const char *mesa_sha, size_t size);
> +
> +_EXTC char *_mesa_program_serialize(struct gl_shader_program *prog,
> +   size_t *size, const char *mesa_sha);
> +
> +_EXTC int _mesa_program_unserialize(struct gl_shader_program *prog,
> +   const GLvoid *blob, size_t size, const char *mesa_sha);
>

This _EXTC stuff is nonstandard.  What we typically do is to surround the
block with

#ifdef __cplusplus
extern "C" {
#endif

and

#ifdef __cplusplus
} /* extern "C" */
#endif

See, for example, src/glsl/program.h.

> +
> +#ifdef __cplusplus
> +
> +
> +/* helper class for writing data to memory */
>

Please use doxygen-formatted comments for documenting structs, classes,
functions, and struct/class members, i.e.:

/**
 * blah blah blah
 */

Also, comments of the form "helper class for XXX" are generally hard to
follow.  We really need a comment explaining what the class does.  For
example, in this case, maybe it should be something like:

/**
 * This class maintains a dynamically-sized memory buffer and allows for
data
 * to be efficiently appended to it with automatic resizing.
 */

> +struct memory_writer
>

Let's make this a class rather than a struct to clarify that it's C++ only.

> +{
> +public:
> +   memory_writer() :
> +      memory(NULL),
> +      memory_p(NULL),
> +      curr_size(0),
> +      pos(0)
> +   { }
> +
> +   /* NOTE - there is no dtor to free memory, user is responsible */
> +   void free_memory()
> +   {
> +      if (memory)
> +         free(memory);
> +   }
>

I'm assuming the reason you did this rather than write a destructor was
that you wanted the option of having the user of the class claim ownership
of the memory by calling mem()?

The safer way to do this is to go ahead and write the destructor, and
change mem() to something like this:

inline char *release_memory(size_t *size)
{
   char *result = memory;
   *size = pos;
   memory = NULL;
   memory_p = NULL;
   curr_size = 0;
   pos = 0;
   return result;
}

That way there is no danger of someone forgetting to call free_memory() (or
calling it when they shouldn't).

> +
> +   /* realloc more memory */
> +   int grow(int32_t size)
>

This function should be private.

> +   {
> +      char *more_mem = (char *) realloc(memory, curr_size + size);
>

Growing the allocation to just curr_size + size will give this class O(n^2)
performance (since in general, every call to write() will require a call to
grow(), which may need to memcpy all of the previously-written data).  What
you need to do is something like this (ignoring error handling):

size_t new_size = 2 * (curr_size + size);
char *more_mem = (char *) realloc(memory, new_size);
memory = more_mem;
memory_p = memory + pos;
curr_size = new_size;

By doubling the size on each call to grow() we guarantee that we won't have
to call realloc() more than log(n) times, so the total runtime of all the
realloc's will be bounded by O(n).

Note: if we do this, then sometimes when we're done serializing we'll have
allocated more memory than we need.  To avoid wasting memory over the long
term we should probably do a final realloc() down to the exact size
needed.  Assuming you take my advice about release_memory() above, an easy
way to do that would be to do a final realloc() in release_memory().  Note
that this final realloc() will be efficient because it's guaranteed not to
need to memcpy the data.

> +      if (more_mem == NULL) {
> +         free(memory);
> +         memory = NULL;
> +         return -1;
> +      } else {
> +         memory = more_mem;
> +         memory_p = memory + pos;
> +         curr_size += size;
> +         return 0;
> +      }
> +   }
>

Generally in Mesa we don't worry too much about memory allocation failures,
since in an out of memory condition it's not really feasible to recover.
I'd recommend eliminating the out-of-memory-handling code and changing the
return type of grow() (and the write() functions) to void.

If we do decide to keep the out-of-memory handling, then:
- return type of grow() and write() should be bool, and they should return
true on success.
- we need to change all the code that calls write() to make sure that it
properly checks the return value.

> +
> +   /* write functions for different types */
> +#define _WRITE_TYPE(type) inline int write(type *val) {\
> +   return write(val, sizeof(*val), 1, -1);\
> +}
> +
> +   _WRITE_TYPE(uint8_t)
> +   _WRITE_TYPE(int32_t)
> +   _WRITE_TYPE(uint32_t)
> +   _WRITE_TYPE(bool)
> +   _WRITE_TYPE(gl_texture_index)
> +   _WRITE_TYPE(ir_node_type)
> +   _WRITE_TYPE(ir_loop_jump::jump_mode)
> +   _WRITE_TYPE(ir_expression_operation)
> +   _WRITE_TYPE(GLbitfield64)
>

Because of implicit type conversions, these definitions essentially allow
any integer type to be passed to write(), and that increases the risk of
mistakes.  For example, I believe this would be allowed, in both 32 and 64
bit builds:

size_t s;
write(s);

But in 32 bit builds it'll resolve to _WRITE_TYPE(uint32_t) and in 64 bit
builds it'll resolve to _WRITE_TYPE(GLbitfield64).

I'd prefer if we just made the type explicit in each write function, i.e.
write_uint8(), write_int32(), write_bool(), and so on.  It's a little more
typing at the call site, but it carries the advantage that we always know
exactly what we're writing.

> +
> +   int write(const void *data, int32_t size, int32_t nmemb,
> +      int32_t offset = -1)
>

It looks like nearly all callers set either size or nmemb to 1.  I think
the code would be easier to follow if we just had a single size argument,
and in the rare cases where we are serializing multiple objects whose size
!= 1, just do the multiply in the caller.

> +   {
> +      int32_t amount = size * nmemb;
> +      int32_t extra = 8192;
> +
> +      /* reallocate more if does not fix to current memory */
> +      if (!memory || pos > (int32_t)(curr_size - amount))
> +         if (grow(amount + extra))
> +            return -1;
>

For the case where offset != -1, this will do the wrong thing (it will try
to reserve memory when it's not necessary to do so).  It looks like in all
cases where offset != -1, the necessary memory is guaranteed to be already
reserved (because we are overwriting a value we wrote previously) so I'd
recommend that in this case we simply do:

assert(amount + extra <= pos);

Also, given how differently this function needs to behave when offset !=
-1, and given how infrequently it is called with an offset other than -1,
I'd recommend breaking the offset != -1 case out to its own function (maybe
call it overwrite()).

> +
> +      /**
> +       * by default data is written to pos memory_p, however
> +       * if an offset value >= 0 is passed then that value is
> +       * used as offset from start of the memory blob
> +       */
> +      char *dst = memory_p;
> +
> +      if (offset > -1)
> +         dst = memory + offset;
> +
> +      memcpy(dst, data, amount);
> +
> +      /* if no offset given, forward the pointer */
> +      if (offset == -1) {
> +         memory_p += amount;
> +         pos += amount;
> +      }
> +      return 0;
> +   }
> +
> +   int write_string(const char *str)
> +   {
> +      if (!str)
> +         return -1;
> +      uint32_t len = strlen(str);
> +      write(&len);
> +      write(str, 1, len);
> +      return 0;
> +   }
> +
> +   inline int32_t position() { return pos; }
> +   inline char *mem() { return memory; }
> +
> +private:
> +   char *memory;
> +   char *memory_p;
> +   int32_t curr_size;
> +   int32_t pos;
>

Please add a short comment above each of these private data members
explaining what it stores.

I'm not comfortable with storing both memory_p and pos in this class--they
are redundant, so it means we have to always double check that we keep the
two of them in sync.  I'd suggest getting rid of memory_p, since that looks
like it would require the smallest number of changes to the rest of the
class.

> +};
> +
> +
> +/* helper for string serialization */
>

Maybe this comment could be something like:

/**
 * Wrapper for a C string that manages allocation and deallocation of the
 * underlying memory.
 */

> +struct string_data
>

This should also be a class.

> +{
> +public:
> +   string_data() : len(0), data(NULL) {}
> +   string_data(const char *cstr) :
> +      data(NULL)
> +   {
> +      if (cstr) {
>

It looks like we almost never expect to pass NULL to the string_data
constructor (other than by mistake).  I'd prefer to change things so that
we never pass NULL to the string_data constructor, and get rid of this
test, so that if we ever do pass NULL to string_data by mistake, we'll
notice the bug as quickly as possible.

> +         len = strlen(cstr);
> +         data = _mesa_strdup(cstr);
> +      }
> +   }
> +
> +   ~string_data() {
> +      if (data) {
> +         free(data);
> +         data = NULL;
>

It's not necessary to set data to NULL here, since after the destructor
finishes the class won't be accessible anymore.

> +      }
> +   }
> +
> +   int serialize(memory_writer &blob)
>

As with the grow() and write() functions above, I think this should return
void.  However, if we do decide to give it a return value, it should be a
bool, with true indicating success.

> +   {
> +      if (!data)
> +         return -1;
>

We never expect to call serialize() on string_data whose data is NULL,
right?  If so, this should be an assertion:

assert(data != NULL);

> +
> +      blob.write(&len);
> +      blob.write(data, 1, len);
> +      return 0;
> +   }
> +
> +   void set(const char *cstr)
> +   {
> +      if (data)
> +         free(data);
>

With one exception (which I have a suggestion about changing, see
ir_cache_variable_data below), it looks like we only ever call set() in
circumstances where data is NULL.  I'd suggest changing this to:

assert(data == NULL);

> +      data = _mesa_strdup(cstr);
> +      len = strlen(cstr);
> +   }
> +
> +   uint32_t len;
> +   char *data;
>

These two data members should be private.  Currently their only users
outside of the class are glsl_type_data::serialize() and the
ir_cache_variable_data constructor, both of which can be easily changed not
to access them.

> +};
> +
> +
> +/* data required to serialize glsl_type */
> +struct glsl_type_data
>

Can you help me understand why it's necessary to have a separate class for
this, rather than simply give glsl_type a serialize() method?  At the
moment this seems like needless copying.

> +{
> +public:
> +   glsl_type_data() :
> +      name(NULL),
> +      element_type(NULL),
> +      field_names(NULL),
> +      field_types(NULL),
> +      field_major(NULL) {}
> +
> +   glsl_type_data(const glsl_type *t) :
> +      base_type(t->base_type),
> +      length(t->length),
> +      vector_elms(t->vector_elements),
> +      matrix_cols(t->matrix_columns),
> +      sampler_dimensionality(t->sampler_dimensionality),
> +      sampler_shadow(t->sampler_shadow),
> +      sampler_array(t->sampler_array),
> +      sampler_type(t->sampler_type),
> +      interface_packing(t->interface_packing),
> +      element_type(NULL),
> +      field_names(NULL),
> +      field_types(NULL),
> +      field_major(NULL)
>

These inline functions are starting to get large for my taste.  Starting
with this class, let's move them into the .cpp file.  If we later find that
it's necessary to inline some of them for performance reasons, we can
always change them back to inline.  But I doubt this will ever be a hot
enough code path for that to be necessary.

> +   {
> +      name = new string_data(t->name);
> +
> +      /* for array, save element type information */
> +      if (t->base_type == GLSL_TYPE_ARRAY)
> +         element_type =
> +            new glsl_type_data(t->element_type());
> +
> +      /* with structs, copy each struct field name + type */
> +      else if (t->base_type == GLSL_TYPE_STRUCT) {
>

Don't we need to handle interface block types as well?  They have
additional information in glsl_struct_field that's important (location,
interpolation, and centroid).

> +
> +         field_names = new string_data[t->length];
> +         field_types = new glsl_type_data*[t->length];
> +         field_major = new uint32_t[t->length];
> +
> +         glsl_struct_field *field = t->fields.structure;
> +         glsl_type_data **field_t = field_types;
> +         for (unsigned k = 0; k < t->length; k++, field++, field_t++) {
> +            field_names[k].set(field->name);
> +            *field_t = new glsl_type_data(field->type);
> +            field_major[k] = field->row_major;
> +         }
> +      }
> +   }
> +
> +   ~glsl_type_data() {
> +      delete name;
> +      delete element_type;
> +      delete [] field_names;
> +      delete [] field_major;
> +      if (field_types) {
> +          struct glsl_type_data **data = field_types;
> +          for (int k = 0; k < length; k++, data++)
> +             delete *data;
> +          delete [] field_types;
> +      }
> +   }
> +
> +   int serialize(memory_writer &blob)
> +   {
> +      uint32_t ir_len = 666;
> +      blob.write(&name->len);
> +      blob.write(name->data, 1, name->len);
>

These two lines can be replaced with:

name->serialize(blob);

> +
> +      int32_t start_pos = blob.position();
> +      blob.write(&ir_len);
>

It seeems like this pattern crops up a lot: write out a bogus length value,
keep track of its position, and then later overwrite that with the true
length once the rest of the data has been written.  How about if we make a
class that does all this work?  I'm thinking something like:

struct memory_writer
{
   ...
   class scoped_length
   {
   public:
      explicit scoped_length(memory_writer *writer)
         : writer(writer)
      {
         uint32_t dummy;
         writer->write(&dummy);
         start_pos = writer->position();
      }

      ~scoped_length()
      {
         uint32_t len = writer->position() - start_pos;
         writer->write(&len, sizeof(len), 1, start_pos);
      }

   private:
      memory_writer *writer;
      uint32_t start_pos;
   };
   ...
};

And then any time you need to serialize the length of some data that
follows you can just do:

{
   memory_writer::scoped_length len(&blob); /* reserves space for the
length */
   ... /* write out variable length data */
} /* when len goes out of scope, the proper length is automatically written
*/

> +
> +      blob.write(this, sizeof(*this), 1);
> +
> +      if (base_type == GLSL_TYPE_ARRAY)
> +         element_type->serialize(blob);
> +      else if (base_type == GLSL_TYPE_STRUCT) {
>

As above, I'm worried that we may have to handle interface block types as
well.

> +         struct string_data *data = field_names;
> +         glsl_type_data **field_t = field_types;
> +         for (int k = 0; k < length; k++, data++, field_t++) {
> +            data->serialize(blob);
> +            (*field_t)->serialize(blob);
> +            blob.write(&field_major[k]);
> +         }
> +      }
> +
> +      ir_len = blob.position() - start_pos - sizeof(ir_len);
> +      blob.write(&ir_len, sizeof(ir_len), 1, start_pos);
> +      return 0;
> +   }
> +
> +   int32_t base_type;
> +   int32_t length;
> +   int32_t vector_elms;
> +   int32_t matrix_cols;
> +
> +   uint32_t sampler_dimensionality;
> +   uint32_t sampler_shadow;
> +   uint32_t sampler_array;
> +   uint32_t sampler_type;
> +
> +   uint32_t interface_packing;
> +
> +   struct string_data *name;
> +
> +   /* array element type */
> +   struct glsl_type_data *element_type;
> +
> +   /* structure fields */
> +   struct string_data *field_names;
> +   struct glsl_type_data **field_types;
> +   uint32_t *field_major;
> +};
> +
> +
> +/* helper to create a unique id from a ir_variable address */
> +static uint32_t _unique_id(ir_variable *var)
> +{
> +   char buffer[256];
> +   _mesa_snprintf(buffer, 256, "%s_%p", var->name, var);
> +   return _mesa_str_checksum(buffer);
>

Two problems:

1. This is a lot of work to obtain a unique identifier.  The pointer is
already unique; we should just use that.

2. There's no guarantee that the checksum of the string will be unique.
Because of the birthday paradox, checksum collisions are probably going to
happen all the time.

Instead of going to a lot of work to generate a unique id for each
ir_variable, I think we should just serialize the ir_variable pointer
address.  If we are trying to use the same binary format for 32-bit and
64-bit builds (see my clarifying question at the top of this email), we'll
need to extend the pointer to 64 bits before serializing it.

Alternatively, if you want to keep the unique ID's small, we'll have to use
some hashtable mechanism to map each variable to its own ID.

> +}
> +
> +
> +/* data required to serialize ir_variable */
> +struct ir_cache_variable_data
>

As with glsl_type_data, I don't understand why we need to have this class
rather than just adding a serialize() method to the ir_variable class.

> +{
> +public:
> +   ir_cache_variable_data() :
> +      type(NULL),
> +      name(NULL),
> +      unique_name(NULL),
> +      state_slots(NULL) {}
> +
> +   ir_cache_variable_data(ir_variable *ir) :
> +      unique_id(_unique_id(ir)),
> +      max_array_access(ir->max_array_access),
> +      ir_type(ir->ir_type),
> +      mode(ir->mode),
> +      location(ir->location),
> +      read_only(ir->read_only),
> +      centroid(ir->centroid),
> +      invariant(ir->invariant),
> +      interpolation(ir->interpolation),
> +      origin_upper_left(ir->origin_upper_left),
> +      pixel_center_integer(ir->pixel_center_integer),
> +      explicit_location(ir->explicit_location),
> +      explicit_index(ir->explicit_index),
> +      explicit_binding(ir->explicit_binding),
> +      has_initializer(ir->has_initializer),
> +      depth_layout(ir->depth_layout),
> +      location_frac(ir->location_frac),
> +      num_state_slots(ir->num_state_slots),
> +      has_constant_value(ir->constant_value ? 1 : 0),
> +      has_constant_initializer(ir->constant_initializer ? 1 : 0)
> +   {
> +      name = new string_data(ir->name);
> +
> +      /* name can be NULL, see ir_print_visitor for explanation */
> +      if (!ir->name)
> +         name->set("parameter");
>

To avoid having to call set() on a non-NULL string_data, I think we should
change this to:

const char *non_null_name = ir->name ? ir->name : "parameter";
name = new string_data(non_null_name);

> +
> +      char uniq[256];
> +      _mesa_snprintf(uniq, 256, "%s_%d", name->data, unique_id);
>

To avoid having to access an element of a string_data that ought to be
private, I think we should change this to:

_mesa_snprintf(uniq, 256, "%s_%d", non_null_name, unique_id);

(Note, however, that if we decide to use a pointer as the unique ID, this
will have to change to ensure that on 64-bit builds, the entire pointer is
included in the printf)

Two additional problems:

1. If the name is too long, uniq won't get null terminated, and the call to
string_data(uniq) below will go outside its bounds.  Since the name comes
from the client-supplied shader, this could conceivably lead to a security
exploit.

2. If the name is too long, unique_name won't be guaranteed to be unique.

I'd recommend just using ralloc_asprintf(), which allocates a string of the
appropriate length.

> +      unique_name = new string_data(uniq);
> +      type = new glsl_type_data(ir->type);
> +
> +      state_slots = (ir_state_slot *) malloc
> +         (ir->num_state_slots * sizeof(ir->state_slots[0]));
> +      memcpy(state_slots, ir->state_slots,
> +         ir->num_state_slots * sizeof(ir->state_slots[0]));
> +   }
> +
> +   ~ir_cache_variable_data()
> +   {
> +      delete name;
> +      delete unique_name;
> +      delete type;
> +
> +      if (state_slots)
> +         free(state_slots);
> +   }
> +
> +   int serialize(memory_writer &blob)
> +   {
> +      type->serialize(blob);
> +      name->serialize(blob);
> +      unique_name->serialize(blob);
> +      blob.write(this, sizeof(*this), 1);
> +
> +      for (unsigned i = 0; i < num_state_slots; i++) {
> +         blob.write(&state_slots[i].swizzle);
> +         for (unsigned j = 0; j < 5; j++) {
> +            blob.write(&state_slots[i].tokens[j]);
> +         }
> +      }
> +      return 0;
> +   }
> +
> +   struct glsl_type_data *type;
> +   struct string_data *name;
> +   struct string_data *unique_name;
> +
> +   uint32_t unique_id;
> +
> +   uint32_t max_array_access;
> +   int32_t ir_type;
> +   int32_t mode;
> +   int32_t location;
> +   uint32_t read_only;
> +   uint32_t centroid;
> +   uint32_t invariant;
> +   uint32_t interpolation;
> +   uint32_t origin_upper_left;
> +   uint32_t pixel_center_integer;
> +   uint32_t explicit_location;
> +   uint32_t explicit_index;
> +   uint32_t explicit_binding;
> +   uint8_t has_initializer;
> +   int32_t depth_layout;
> +   uint32_t location_frac;
> +
> +   uint32_t num_state_slots;
> +   struct ir_state_slot *state_slots;
> +
> +   uint8_t has_constant_value;
> +   uint8_t has_constant_initializer;
> +};
> +
> +
> +/* helper class to read instructions */
>

This comment should be expanded to explain that this class wraps around the
posix functions for mapping a file into the process's address space.

> +struct memory_map
> +{
> +public:
> +   memory_map() :
> +      fd(0),
> +      cache_size(0),
> +      cache_mmap(NULL),
> +      cache_mmap_p(NULL) { }
> +
> +   /* read from disk */
> +   int map(const char *path)
>

Return value should be a boolean, with true indicating success.

> +   {
> +      struct stat stat_info;
> +      if (stat(path, &stat_info) != 0)
> +         return -1;
> +
> +      cache_size = stat_info.st_size;
> +
> +      fd = open(path, O_RDONLY);
> +      if (fd) {
> +         cache_mmap_p = cache_mmap = (char *)
> +            mmap(NULL, cache_size, PROT_READ, MAP_PRIVATE, fd, 0);
>

Will this work on Windows?

> +         return (cache_mmap == MAP_FAILED) ? -1 : 0;
> +      }
> +      return -1;
> +   }
> +
> +   /* read from memory */
> +   int map(const void *memory, size_t size)
> +   {
> +      cache_mmap_p = cache_mmap = (char *) memory;
> +      cache_size = size;
> +      return 0;
> +   }
> +
> +   ~memory_map() {
> +      if (cache_mmap) {
> +         munmap(cache_mmap, cache_size);
> +         close(fd);
> +      }
> +   }
> +
> +   /* move read pointer forward */
> +   inline void ffwd(int len)
> +   {
> +      cache_mmap_p += len;
> +   }
> +
> +   /* position of read pointer */
> +   inline uint32_t position()
> +   {
> +      return cache_mmap_p - cache_mmap;
> +   }
> +
> +   inline void read_string(char *str)
> +   {
> +      uint32_t len;
> +      read(&len);
> +      memcpy(str, cache_mmap_p, len);
> +      str[len] = '\0';
> +      ffwd(len);
>

If the string is larger than the caller expects (or the contents of the
shader binary are corrupted), this function may overwrite arbitrarily large
areas of memory.  Since the shader binary may be supplied by client code
(in the case of OES_get_program_binary), this is a security risk.

I can think of two possible ways of fixing this:

1. Change this function so that it copies the string to a newly allocated
chunk of memory.

2. Change the format of the shader binary so that strings are stored as
null-terminated (rather than length followed by string data); that way this
function can simply return cache_mmap_p and fast forward to the next byte
after the end of the string.

I'd lean in favor of #2.

> +   }
> +
> +   /* read functions for different types */
> +#define _READ_TYPE(type) inline void read(type *val) {\
> +   *val = *(type *) cache_mmap_p;\
> +   ffwd(sizeof(type));\
> +}
> +   _READ_TYPE(int32_t)
> +   _READ_TYPE(uint32_t)
> +   _READ_TYPE(bool)
> +   _READ_TYPE(GLboolean)
> +   _READ_TYPE(gl_texture_index)
> +   _READ_TYPE(ir_expression_operation)
> +
> +   inline void read(void *dst, size_t size)
> +   {
> +      memcpy(dst, cache_mmap_p, size);
> +      ffwd(size);
> +   }
> +
> +   /* read and serialize a gl_shader */
> +   inline struct gl_shader *read_shader(void *mem_ctx,
> +      const char *mesa_sha, size_t size)
> +   {
> +      struct gl_shader *sha = _mesa_shader_unserialize(mem_ctx,
> +         cache_mmap_p, mesa_sha, size);
> +      ffwd(size);
> +      return sha;
> +   }
>

It surprises me to see us fast-forward by a caller-supplied size here.
Shouldn't we encode the size in the first 4 bytes of the shader binary?

> +
> +private:
> +
> +   int32_t fd;
> +   int32_t cache_size;
> +   char *cache_mmap;
> +   char *cache_mmap_p;
> +};
> +
> +
> +/* class to read and write gl_shader */
>

This is one of the cases where I need more information in the comment above
the class definition.  Is this a class that should be created once per
context, and represents a proxy to an on-disk cache of shaders?  Or is it a
class that should be created once per load/save of a single shader, to aid
in saving/loading the shader?  I'm guessing it's the latter, because
prototypes_only seems like the sort of thing that will change from one
serialization/deserialization to the next.

Perhaps a clearer name for this class would be something like
"ir_serializer"?

Also, it looks like there's very little crossover between the code needed
to serialize a shader and the code needed to deserialize a shader.  Perhaps
this should be split into two classes, ir_serializer and ir_deserializer.

> +struct ir_cache
> +{
> +public:
> +   ir_cache(bool prototypes = false) :
> +      prototypes_only(prototypes)
> +   {
> +      var_ht = hash_table_ctor(0, hash_table_string_hash,
> +         hash_table_string_compare);
> +   }
> +
> +   ~ir_cache()
> +   {
> +      hash_table_call_foreach(this->var_ht, delete_key, NULL);
> +      hash_table_dtor(this->var_ht);
> +   }
> +
> +   /* serialize gl_shader to memory  */
> +   char *serialize(struct gl_shader *shader,
> +      struct _mesa_glsl_parse_state *state,
> +      const char *mesa_sha, size_t *size);
>

Style nit-pick: the arguments should be lined up in the column after the
opening paren, like this:

   char *serialize(struct gl_shader *shader,
                   struct _mesa_glsl_parse_state *state,
                   const char *mesa_sha, size_t *size);

> +
> +   /* unserialize gl_shader from mapped memory */
> +   struct gl_shader *unserialize(void *mem_ctx, memory_map &map,
> +      uint32_t shader_size,
> +      struct _mesa_glsl_parse_state *state,
> +      const char *mesa_sha,
> +      int *error_code);
> +
> +   enum cache_error {
> +      GENERAL_READ_ERROR = -1,
> +      DIFFERENT_MESA_SHA = -2,
> +      DIFFERENT_LANG_VER = -3,
> +   };
> +
> +   /**
> +    * this method is public so that gl_shader_program
> +    * unserialization can use it when reading the
> +    * uniform storage
> +    */
> +   const glsl_type *read_glsl_type(memory_map &map,
> +      struct _mesa_glsl_parse_state *state);
> +
> +private:
> +
> +   /* variables and methods required for serialization */
> +
> +   memory_writer blob;
> +
> +   bool prototypes_only;
> +
> +   /**
> +    * writes ir_type and instruction dump size as a 'header'
> +    * for each instruction before calling save_ir
> +    */
> +   int save(ir_instruction *ir);
> +
> +   int save_ir(ir_variable *ir);
> +   int save_ir(ir_assignment *ir);
> +   int save_ir(ir_call *ir);
> +   int save_ir(ir_constant *ir);
> +   int save_ir(ir_dereference_array *ir);
> +   int save_ir(ir_dereference_record *ir);
> +   int save_ir(ir_dereference_variable *ir);
> +   int save_ir(ir_discard *ir);
> +   int save_ir(ir_expression *ir);
> +   int save_ir(ir_function *ir);
> +   int save_ir(ir_function_signature *ir);
> +   int save_ir(ir_if *ir);
> +   int save_ir(ir_loop *ir);
> +   int save_ir(ir_loop_jump *ir);
> +   int save_ir(ir_return *ir);
> +   int save_ir(ir_swizzle *ir);
> +   int save_ir(ir_texture *ir);
> +   int save_ir(ir_emit_vertex *ir);
> +   int save_ir(ir_end_primitive *ir);
> +
> +
> +   /* variables and methods required for unserialization */
> +
> +   struct _mesa_glsl_parse_state *state;
> +   void *mem_ctx;
> +
> +   struct exec_list *top_level;
> +   struct exec_list *prototypes;
> +   struct exec_list *current_function;
>

These private data members need to have documentation comments.  In
particular, since an exec_list can in principle store anything that derives
from exec_node, the comments should explain what kind of data is stored in
each list.

> +
> +   int read_header(struct gl_shader *shader, memory_map &map,
> +      const char *mesa_sha);
> +   int read_prototypes(memory_map &map);
> +
> +   int read_instruction(struct exec_list *list, memory_map &map,
> +      bool ignore = false);
> +
> +   int read_ir_variable(struct exec_list *list, memory_map &map);
> +   int read_ir_assignment(struct exec_list *list, memory_map &map);
> +   int read_ir_function(struct exec_list *list, memory_map &map);
> +   int read_ir_if(struct exec_list *list, memory_map &map);
> +   int read_ir_return(struct exec_list *list, memory_map &map);
> +   int read_ir_call(struct exec_list *list, memory_map &map);
> +   int read_ir_discard(struct exec_list *list, memory_map &map);
> +   int read_ir_loop(struct exec_list *list, memory_map &map);
> +   int read_ir_loop_jump(struct exec_list *list, memory_map &map);
> +   int read_emit_vertex(struct exec_list *list, memory_map &map);
> +   int read_end_primitive(struct exec_list *list, memory_map &map);
> +
> +   /* rvalue readers */
> +   ir_rvalue *read_ir_rvalue(memory_map &map);
> +   ir_constant *read_ir_constant(memory_map &map,
> +      struct exec_list *list = NULL);
> +   ir_swizzle *read_ir_swizzle(memory_map &map);
> +   ir_texture *read_ir_texture(memory_map &map);
> +   ir_expression *read_ir_expression(memory_map &map);
> +   ir_dereference_array *read_ir_dereference_array(memory_map &map);
> +   ir_dereference_record *read_ir_dereference_record(memory_map &map);
> +   ir_dereference_variable *read_ir_dereference_variable(memory_map &map);
> +
> +   /**
> +    * var_ht is used to store created ir_variables with a unique_key for
> +    * each so that ir_dereference_variable creation can find the variable
> +    */
> +   struct hash_table *var_ht;
> +
> +   /**
> +    * these 2 functions are ~copy-pasta from string_to_uint_map,
>

I think you mean "copy-pasted" :)

> +    * we need a hash here with a string key and pointer value
> +    */
> +   void hash_store(void * value, const char *key)
> +   {
> +      char *dup_key = _mesa_strdup(key);
> +      bool result = hash_table_replace(this->var_ht, value, dup_key);
> +      if (result)
> +         free(dup_key);
> +   }
> +
> +   static void delete_key(const void *key, void *data, void *closure)
> +   {
> +      (void) data;
> +      (void) closure;
> +      free((char *)key);
> +   }
> +
> +};
> +#endif /* ifdef __cplusplus */
> +
> +#endif /* IR_CACHE_H */
>

Unfortunately, I'm out of time to review this patch series today.  I'm not
sure when I'll be able to get back to it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131028/e89dc3e8/attachment-0001.html>