[Mesa-dev] [PATCH 3/4] glsl: add ir_cache class and functions for shader serialization

Tue Oct 29 08:53:07 CET 2013

Hi;

Thanks for the review, some replies below ..

On 10/28/2013 10:09 PM, Paul Berry wrote:
> Before we start, I have a clarifying question: Is the binary format 
> intended to be the same for 32- and 64-bit builds, or different?  At 
> least one of my comments below will depend on the answer to this question.
>

It's intended to be the same, I've tried to avoid of using long or 
pointer addresses.

> On 24 October 2013 01:28, Tapani Pälli <tapani.palli at intel.com 
> <mailto:tapani.palli at intel.com>> wrote:
>
>     +/* cache specific debug output */
>     +#ifdef SHADER_CACHE_DEBUG
>     +#define CACHE_DEBUG(fmt, args...) printf(fmt, ## args)
>     +#else
>     +#define CACHE_DEBUG(fmt, args...) do {} while (0)
>     +#endif
>
>
> This is the gcc-specific variadic argument syntax. To be compatible 
> with MSVC, we have to use the C99 syntax, which is:
>
> #ifdef SHADER_CACHE_DEBUG
> #define CACHE_DEBUG(fmt, ...) printf(fmt, ## __VA_ARGS__)
> #else
> #define CACHE_DEBUG(fmt, ...) do {} while (0)
> #endif
>

ok, will fix

>     +
>     +/* C API for the cache */
>     +#ifdef __cplusplus
>     +#define _EXTC extern "C"
>     +#else
>     +#define _EXTC
>     +#endif
>     +
>     +_EXTC char *_mesa_shader_serialize(struct gl_shader *shader,
>     +   struct _mesa_glsl_parse_state *state,
>     +   const char *mesa_sha, size_t *size);
>     +
>     +_EXTC struct gl_shader *_mesa_shader_unserialize(void *mem_ctx,
>     +   void *blob, const char *mesa_sha, size_t size);
>     +
>     +_EXTC char *_mesa_program_serialize(struct gl_shader_program *prog,
>     +   size_t *size, const char *mesa_sha);
>     +
>     +_EXTC int _mesa_program_unserialize(struct gl_shader_program *prog,
>     +   const GLvoid *blob, size_t size, const char *mesa_sha);
>
>
> This _EXTC stuff is nonstandard.  What we typically do is to surround 
> the block with
>
> #ifdef __cplusplus
> extern "C" {
> #endif
>
> and
>
> #ifdef __cplusplus
> } /* extern "C" */
> #endif
>
> See, for example, src/glsl/program.h.

This is what I originally did but with many functions it looks quite 
ugly, this is why I wanted to try to make it a bit more compact. But I 
can switch it back.

>     +
>     +#ifdef __cplusplus
>     +
>     +
>     +/* helper class for writing data to memory */
>
>
> Please use doxygen-formatted comments for documenting structs, 
> classes, functions, and struct/class members, i.e.:
>
> /**
>  * blah blah blah
>  */
>
> Also, comments of the form "helper class for XXX" are generally hard 
> to follow.  We really need a comment explaining what the class does.  
> For example, in this case, maybe it should be something like:
>
> /**
>  * This class maintains a dynamically-sized memory buffer and allows 
> for data
>  * to be efficiently appended to it with automatic resizing.
>  */
>

Thanks, will do. I agree that current comments don't tell enough.

>     +struct memory_writer
>
>
> Let's make this a class rather than a struct to clarify that it's C++ 
> only.

ok

>     +{
>     +public:
>     +   memory_writer() :
>     +      memory(NULL),
>     +      memory_p(NULL),
>     +      curr_size(0),
>     +      pos(0)
>     +   { }
>     +
>     +   /* NOTE - there is no dtor to free memory, user is responsible */
>     +   void free_memory()
>     +   {
>     +      if (memory)
>     +         free(memory);
>     +   }
>
>
> I'm assuming the reason you did this rather than write a destructor 
> was that you wanted the option of having the user of the class claim 
> ownership of the memory by calling mem()?
>
> The safer way to do this is to go ahead and write the destructor, and 
> change mem() to something like this:
>
> inline char *release_memory(size_t *size)
> {
>    char *result = memory;
>    *size = pos;
>    memory = NULL;
>    memory_p = NULL;
>    curr_size = 0;
>    pos = 0;
>    return result;
> }
>
> That way there is no danger of someone forgetting to call 
> free_memory() (or calling it when they shouldn't).

good point, I will change this.

>     +
>     +   /* realloc more memory */
>     +   int grow(int32_t size)
>
>
> This function should be private.

agreed, it was originally public just for debug purposes and mistakenly 
left public.

>     +   {
>     +      char *more_mem = (char *) realloc(memory, curr_size + size);
>
>
> Growing the allocation to just curr_size + size will give this class 
> O(n^2) performance (since in general, every call to write() will 
> require a call to grow(), which may need to memcpy all of the 
> previously-written data).  What you need to do is something like this 
> (ignoring error handling):
>
> size_t new_size = 2 * (curr_size + size);
> char *more_mem = (char *) realloc(memory, new_size);
> memory = more_mem;
> memory_p = memory + pos;
> curr_size = new_size;
>
> By doubling the size on each call to grow() we guarantee that we won't 
> have to call realloc() more than log(n) times, so the total runtime of 
> all the realloc's will be bounded by O(n).
>

This is taken care by the caller which has some 'extra' on top to help 
this but 2 * sounds fine for me. I thought this is more of a concern of 
the caller and not this class but it is true that having it here makes 
caller's life more simpler.

> Note: if we do this, then sometimes when we're done serializing we'll 
> have allocated more memory than we need.  To avoid wasting memory over 
> the long term we should probably do a final realloc() down to the 
> exact size needed.  Assuming you take my advice about release_memory() 
> above, an easy way to do that would be to do a final realloc() in 
> release_memory().  Note that this final realloc() will be efficient 
> because it's guaranteed not to need to memcpy the data.

Thanks, makes sense. Currently the extra memory is there but only for 
quite a short time.

>     +      if (more_mem == NULL) {
>     +         free(memory);
>     +         memory = NULL;
>     +         return -1;
>     +      } else {
>     +         memory = more_mem;
>     +         memory_p = memory + pos;
>     +         curr_size += size;
>     +         return 0;
>     +      }
>     +   }
>
>
> Generally in Mesa we don't worry too much about memory allocation 
> failures, since in an out of memory condition it's not really feasible 
> to recover.  I'd recommend eliminating the out-of-memory-handling code 
> and changing the return type of grow() (and the write() functions) to 
> void.
>
> If we do decide to keep the out-of-memory handling, then:
> - return type of grow() and write() should be bool, and they should 
> return true on success.
> - we need to change all the code that calls write() to make sure that 
> it properly checks the return value.

OK, I'm fine with removing out-of-memory handling.

>     +
>     +   /* write functions for different types */
>     +#define _WRITE_TYPE(type) inline int write(type *val) {\
>     +   return write(val, sizeof(*val), 1, -1);\
>     +}
>     +
>     +   _WRITE_TYPE(uint8_t)
>     +   _WRITE_TYPE(int32_t)
>     +   _WRITE_TYPE(uint32_t)
>     +   _WRITE_TYPE(bool)
>     +   _WRITE_TYPE(gl_texture_index)
>     +   _WRITE_TYPE(ir_node_type)
>     +   _WRITE_TYPE(ir_loop_jump::jump_mode)
>     +   _WRITE_TYPE(ir_expression_operation)
>     +   _WRITE_TYPE(GLbitfield64)
>
>
> Because of implicit type conversions, these definitions essentially 
> allow any integer type to be passed to write(), and that increases the 
> risk of mistakes.  For example, I believe this would be allowed, in 
> both 32 and 64 bit builds:
>
> size_t s;
> write(s);
>
> But in 32 bit builds it'll resolve to _WRITE_TYPE(uint32_t) and in 64 
> bit builds it'll resolve to _WRITE_TYPE(GLbitfield64).
>
> I'd prefer if we just made the type explicit in each write function, 
> i.e. write_uint8(), write_int32(), write_bool(), and so on.  It's a 
> little more typing at the call site, but it carries the advantage that 
> we always know exactly what we're writing.

I'll open them up, Eric mentioned of this as well. I was hoping to get 
around it (either with templates or macros) because it will make code 
harder to maintain as these types and structures might change any day.

>     +
>     +   int write(const void *data, int32_t size, int32_t nmemb,
>     +      int32_t offset = -1)
>
>
> It looks like nearly all callers set either size or nmemb to 1.  I 
> think the code would be easier to follow if we just had a single size 
> argument, and in the rare cases where we are serializing multiple 
> objects whose size != 1, just do the multiply in the caller.

Fair enough, this can be simplified. I did this to stay closer to fwrite 
api which this code was originally using.

>     +   {
>     +      int32_t amount = size * nmemb;
>     +      int32_t extra = 8192;
>     +
>     +      /* reallocate more if does not fix to current memory */
>     +      if (!memory || pos > (int32_t)(curr_size - amount))
>     +         if (grow(amount + extra))
>     +            return -1;
>
>
> For the case where offset != -1, this will do the wrong thing (it will 
> try to reserve memory when it's not necessary to do so).  It looks 
> like in all cases where offset != -1, the necessary memory is 
> guaranteed to be already reserved (because we are overwriting a value 
> we wrote previously) so I'd recommend that in this case we simply do:
>
> assert(amount + extra <= pos);
>
> Also, given how differently this function needs to behave when offset 
> != -1, and given how infrequently it is called with an offset other 
> than -1, I'd recommend breaking the offset != -1 case out to its own 
> function (maybe call it overwrite()).

True, I did not think this case through. I will write it as separate 
function.

>     +
>     +      /**
>     +       * by default data is written to pos memory_p, however
>     +       * if an offset value >= 0 is passed then that value is
>     +       * used as offset from start of the memory blob
>     +       */
>     +      char *dst = memory_p;
>     +
>     +      if (offset > -1)
>     +         dst = memory + offset;
>     +
>     +      memcpy(dst, data, amount);
>     +
>     +      /* if no offset given, forward the pointer */
>     +      if (offset == -1) {
>     +         memory_p += amount;
>     +         pos += amount;
>     +      }
>     +      return 0;
>     +   }
>     +
>     +   int write_string(const char *str)
>     +   {
>     +      if (!str)
>     +         return -1;
>     +      uint32_t len = strlen(str);
>     +      write(&len);
>     +      write(str, 1, len);
>     +      return 0;
>     +   }
>     +
>     +   inline int32_t position() { return pos; }
>     +   inline char *mem() { return memory; }
>     +
>     +private:
>     +   char *memory;
>     +   char *memory_p;
>     +   int32_t curr_size;
>     +   int32_t pos;
>
>
> Please add a short comment above each of these private data members 
> explaining what it stores.
>
> I'm not comfortable with storing both memory_p and pos in this 
> class--they are redundant, so it means we have to always double check 
> that we keep the two of them in sync.  I'd suggest getting rid of 
> memory_p, since that looks like it would require the smallest number 
> of changes to the rest of the class.

OK, will fix.

>     +};
>     +
>     +
>     +/* helper for string serialization */
>
>
> Maybe this comment could be something like:
>
> /**
>  * Wrapper for a C string that manages allocation and deallocation of the
>  * underlying memory.
>  */
>

Thanks, I'll go through the code and try to improve documentation in 
general.

>     +struct string_data
>
>
> This should also be a class.
>
>     +{
>     +public:
>     +   string_data() : len(0), data(NULL) {}
>     +   string_data(const char *cstr) :
>     +      data(NULL)
>     +   {
>     +      if (cstr) {
>
>
> It looks like we almost never expect to pass NULL to the string_data 
> constructor (other than by mistake).  I'd prefer to change things so 
> that we never pass NULL to the string_data constructor, and get rid of 
> this test, so that if we ever do pass NULL to string_data by mistake, 
> we'll notice the bug as quickly as possible.

Fine, will change.

>     +         len = strlen(cstr);
>     +         data = _mesa_strdup(cstr);
>     +      }
>     +   }
>     +
>     +   ~string_data() {
>     +      if (data) {
>     +         free(data);
>     +         data = NULL;
>
>
> It's not necessary to set data to NULL here, since after the 
> destructor finishes the class won't be accessible anymore.

ok

>     +      }
>     +   }
>     +
>     +   int serialize(memory_writer &blob)
>
>
> As with the grow() and write() functions above, I think this should 
> return void.  However, if we do decide to give it a return value, it 
> should be a bool, with true indicating success.

I was hoping to use the 'int' everywhere as then I can specify not only 
failure but also error code if wanted.

>     +   {
>     +      if (!data)
>     +         return -1;
>
>
> We never expect to call serialize() on string_data whose data is NULL, 
> right?  If so, this should be an assertion:
>
> assert(data != NULL);

ok

>     +
>     +      blob.write(&len);
>     +      blob.write(data, 1, len);
>     +      return 0;
>     +   }
>     +
>     +   void set(const char *cstr)
>     +   {
>     +      if (data)
>     +         free(data);
>
>
> With one exception (which I have a suggestion about changing, see 
> ir_cache_variable_data below), it looks like we only ever call set() 
> in circumstances where data is NULL.  I'd suggest changing this to:
>
> assert(data == NULL);

ok, can be changed.

>     +      data = _mesa_strdup(cstr);
>     +      len = strlen(cstr);
>     +   }
>     +
>     +   uint32_t len;
>     +   char *data;
>
>
> These two data members should be private. Currently their only users 
> outside of the class are glsl_type_data::serialize() and the 
> ir_cache_variable_data constructor, both of which can be easily 
> changed not to access them.

I thought this more as a little helper / container and not a *real* 
class but I can change this.

>     +};
>     +
>     +
>     +/* data required to serialize glsl_type */
>     +struct glsl_type_data
>
>
> Can you help me understand why it's necessary to have a separate class 
> for this, rather than simply give glsl_type a serialize() method?  At 
> the moment this seems like needless copying.

I wanted to try to serialize everything as a struct to make the 
serialize and unserialize code 'more compact'. It is true that this 
could be just a function like serialization of ir instructions are.

>     +{
>     +public:
>     +   glsl_type_data() :
>     +      name(NULL),
>     +      element_type(NULL),
>     +      field_names(NULL),
>     +      field_types(NULL),
>     +      field_major(NULL) {}
>     +
>     +   glsl_type_data(const glsl_type *t) :
>     +      base_type(t->base_type),
>     +      length(t->length),
>     +      vector_elms(t->vector_elements),
>     +      matrix_cols(t->matrix_columns),
>     +  sampler_dimensionality(t->sampler_dimensionality),
>     +      sampler_shadow(t->sampler_shadow),
>     +      sampler_array(t->sampler_array),
>     +      sampler_type(t->sampler_type),
>     +      interface_packing(t->interface_packing),
>     +      element_type(NULL),
>     +      field_names(NULL),
>     +      field_types(NULL),
>     +      field_major(NULL)
>
>
> These inline functions are starting to get large for my taste.  
> Starting with this class, let's move them into the .cpp file.  If we 
> later find that it's necessary to inline some of them for performance 
> reasons, we can always change them back to inline. But I doubt this 
> will ever be a hot enough code path for that to be necessary.

Agreed. Originally I thought this to be much smaller class.

>     +   {
>     +      name = new string_data(t->name);
>     +
>     +      /* for array, save element type information */
>     +      if (t->base_type == GLSL_TYPE_ARRAY)
>     +         element_type =
>     +            new glsl_type_data(t->element_type());
>     +
>     +      /* with structs, copy each struct field name + type */
>     +      else if (t->base_type == GLSL_TYPE_STRUCT) {
>
>
> Don't we need to handle interface block types as well?  They have 
> additional information in glsl_struct_field that's important 
> (location, interpolation, and centroid).

Yes, this is actually missing implementation. I think I'm just missing a 
test case to implement this, current ones never hit interface path.

>     +
>     +         field_names = new string_data[t->length];
>     +         field_types = new glsl_type_data*[t->length];
>     +         field_major = new uint32_t[t->length];
>     +
>     +         glsl_struct_field *field = t->fields.structure;
>     +         glsl_type_data **field_t = field_types;
>     +         for (unsigned k = 0; k < t->length; k++, field++,
>     field_t++) {
>     +            field_names[k].set(field->name);
>     +            *field_t = new glsl_type_data(field->type);
>     +            field_major[k] = field->row_major;
>     +         }
>     +      }
>     +   }
>     +
>     +   ~glsl_type_data() {
>     +      delete name;
>     +      delete element_type;
>     +      delete [] field_names;
>     +      delete [] field_major;
>     +      if (field_types) {
>     +          struct glsl_type_data **data = field_types;
>     +          for (int k = 0; k < length; k++, data++)
>     +             delete *data;
>     +          delete [] field_types;
>     +      }
>     +   }
>     +
>     +   int serialize(memory_writer &blob)
>     +   {
>     +      uint32_t ir_len = 666;
>     +      blob.write(&name->len);
>     +      blob.write(name->data, 1, name->len);
>
>
> These two lines can be replaced with:
>
> name->serialize(blob);

yep, will change

>     +
>     +      int32_t start_pos = blob.position();
>     +      blob.write(&ir_len);
>
>
> It seeems like this pattern crops up a lot: write out a bogus length 
> value, keep track of its position, and then later overwrite that with 
> the true length once the rest of the data has been written.  How about 
> if we make a class that does all this work?  I'm thinking something like:
>
> struct memory_writer
> {
>    ...
>    class scoped_length
>    {
>    public:
>       explicit scoped_length(memory_writer *writer)
>          : writer(writer)
>       {
>          uint32_t dummy;
>          writer->write(&dummy);
>          start_pos = writer->position();
>       }
>
>       ~scoped_length()
>       {
>          uint32_t len = writer->position() - start_pos;
>          writer->write(&len, sizeof(len), 1, start_pos);
>       }
>
>    private:
>       memory_writer *writer;
>       uint32_t start_pos;
>    };
>    ...
> };
>
> And then any time you need to serialize the length of some data that 
> follows you can just do:
>
> {
>    memory_writer::scoped_length len(&blob); /* reserves space for the 
> length */
>    ... /* write out variable length data */
> } /* when len goes out of scope, the proper length is automatically 
> written */

I think this happens only in 3 places but I will try this, it looks cleaner.

>     +
>     +      blob.write(this, sizeof(*this), 1);
>     +
>     +      if (base_type == GLSL_TYPE_ARRAY)
>     +         element_type->serialize(blob);
>     +      else if (base_type == GLSL_TYPE_STRUCT) {
>
>
> As above, I'm worried that we may have to handle interface block types 
> as well.

Yep, missing implementation.

>     +         struct string_data *data = field_names;
>     +         glsl_type_data **field_t = field_types;
>     +         for (int k = 0; k < length; k++, data++, field_t++) {
>     +            data->serialize(blob);
>     +            (*field_t)->serialize(blob);
>     +            blob.write(&field_major[k]);
>     +         }
>     +      }
>     +
>     +      ir_len = blob.position() - start_pos - sizeof(ir_len);
>     +      blob.write(&ir_len, sizeof(ir_len), 1, start_pos);
>     +      return 0;
>     +   }
>     +
>     +   int32_t base_type;
>     +   int32_t length;
>     +   int32_t vector_elms;
>     +   int32_t matrix_cols;
>     +
>     +   uint32_t sampler_dimensionality;
>     +   uint32_t sampler_shadow;
>     +   uint32_t sampler_array;
>     +   uint32_t sampler_type;
>     +
>     +   uint32_t interface_packing;
>     +
>     +   struct string_data *name;
>     +
>     +   /* array element type */
>     +   struct glsl_type_data *element_type;
>     +
>     +   /* structure fields */
>     +   struct string_data *field_names;
>     +   struct glsl_type_data **field_types;
>     +   uint32_t *field_major;
>     +};
>     +
>     +
>     +/* helper to create a unique id from a ir_variable address */
>     +static uint32_t _unique_id(ir_variable *var)
>     +{
>     +   char buffer[256];
>     +   _mesa_snprintf(buffer, 256, "%s_%p", var->name, var);
>     +   return _mesa_str_checksum(buffer);
>
>
> Two problems:
>
> 1. This is a lot of work to obtain a unique identifier.  The pointer 
> is already unique; we should just use that.
>

I was originally using a pointer but then there's the 32 vs 64bit 
address problem. I wanted to avoid usage of pointers.

> 2. There's no guarantee that the checksum of the string will be 
> unique.  Because of the birthday paradox, checksum collisions are 
> probably going to happen all the time.
>

I think there is because the pointer address is unique and is used as 
part of the string.

> Instead of going to a lot of work to generate a unique id for each 
> ir_variable, I think we should just serialize the ir_variable pointer 
> address.  If we are trying to use the same binary format for 32-bit 
> and 64-bit builds (see my clarifying question at the top of this 
> email), we'll need to extend the pointer to 64 bits before serializing it.
>
> Alternatively, if you want to keep the unique ID's small, we'll have 
> to use some hashtable mechanism to map each variable to its own ID.

OK, I will verify if this is a problem. For me it does not seem like a 
problem as I'm using the address there.

>     +}
>     +
>     +
>     +/* data required to serialize ir_variable */
>     +struct ir_cache_variable_data
>
>
> As with glsl_type_data, I don't understand why we need to have this 
> class rather than just adding a serialize() method to the ir_variable 
> class.

This could be a serialize function in ir_cache_serialize.cpp. I did not 
want to touch existing ir classes, I think it helps to have the cache 
code separated and in one place. This is of course debatable, every ir 
instruction could have serialize() and unserialize() to deal with this.

>     +{
>     +public:
>     +   ir_cache_variable_data() :
>     +      type(NULL),
>     +      name(NULL),
>     +      unique_name(NULL),
>     +      state_slots(NULL) {}
>     +
>     +   ir_cache_variable_data(ir_variable *ir) :
>     +      unique_id(_unique_id(ir)),
>     +      max_array_access(ir->max_array_access),
>     +      ir_type(ir->ir_type),
>     +      mode(ir->mode),
>     +      location(ir->location),
>     +      read_only(ir->read_only),
>     +      centroid(ir->centroid),
>     +      invariant(ir->invariant),
>     +      interpolation(ir->interpolation),
>     +      origin_upper_left(ir->origin_upper_left),
>     +  pixel_center_integer(ir->pixel_center_integer),
>     +      explicit_location(ir->explicit_location),
>     +      explicit_index(ir->explicit_index),
>     +      explicit_binding(ir->explicit_binding),
>     +      has_initializer(ir->has_initializer),
>     +      depth_layout(ir->depth_layout),
>     +      location_frac(ir->location_frac),
>     +      num_state_slots(ir->num_state_slots),
>     +      has_constant_value(ir->constant_value ? 1 : 0),
>     +  has_constant_initializer(ir->constant_initializer ? 1 : 0)
>     +   {
>     +      name = new string_data(ir->name);
>     +
>     +      /* name can be NULL, see ir_print_visitor for explanation */
>     +      if (!ir->name)
>     +         name->set("parameter");
>
>
> To avoid having to call set() on a non-NULL string_data, I think we 
> should change this to:
>
> const char *non_null_name = ir->name ? ir->name : "parameter";
> name = new string_data(non_null_name);

ok

>     +
>     +      char uniq[256];
>     +      _mesa_snprintf(uniq, 256, "%s_%d", name->data, unique_id);
>
>
> To avoid having to access an element of a string_data that ought to be 
> private, I think we should change this to:
>
> _mesa_snprintf(uniq, 256, "%s_%d", non_null_name, unique_id);
>
> (Note, however, that if we decide to use a pointer as the unique ID, 
> this will have to change to ensure that on 64-bit builds, the entire 
> pointer is included in the printf)
>
> Two additional problems:
>
> 1. If the name is too long, uniq won't get null terminated, and the 
> call to string_data(uniq) below will go outside its bounds.  Since the 
> name comes from the client-supplied shader, this could conceivably 
> lead to a security exploit.
>

Yep, I've been thinking about this. There's many places in the code 
using hardcoded length. I did not check what is the max length, if any, 
specified in GLSL but there might be one.

> 2. If the name is too long, unique_name won't be guaranteed to be unique.
>
> I'd recommend just using ralloc_asprintf(), which allocates a string 
> of the appropriate length.
>

OK, I'll try using ralloc_asprintf().

>     +      unique_name = new string_data(uniq);
>     +      type = new glsl_type_data(ir->type);
>     +
>     +      state_slots = (ir_state_slot *) malloc
>     +         (ir->num_state_slots * sizeof(ir->state_slots[0]));
>     +      memcpy(state_slots, ir->state_slots,
>     +         ir->num_state_slots * sizeof(ir->state_slots[0]));
>     +   }
>     +
>     +   ~ir_cache_variable_data()
>     +   {
>     +      delete name;
>     +      delete unique_name;
>     +      delete type;
>     +
>     +      if (state_slots)
>     +         free(state_slots);
>     +   }
>     +
>     +   int serialize(memory_writer &blob)
>     +   {
>     +      type->serialize(blob);
>     +      name->serialize(blob);
>     +      unique_name->serialize(blob);
>     +      blob.write(this, sizeof(*this), 1);
>     +
>     +      for (unsigned i = 0; i < num_state_slots; i++) {
>     +         blob.write(&state_slots[i].swizzle);
>     +         for (unsigned j = 0; j < 5; j++) {
>     +  blob.write(&state_slots[i].tokens[j]);
>     +         }
>     +      }
>     +      return 0;
>     +   }
>     +
>     +   struct glsl_type_data *type;
>     +   struct string_data *name;
>     +   struct string_data *unique_name;
>     +
>     +   uint32_t unique_id;
>     +
>     +   uint32_t max_array_access;
>     +   int32_t ir_type;
>     +   int32_t mode;
>     +   int32_t location;
>     +   uint32_t read_only;
>     +   uint32_t centroid;
>     +   uint32_t invariant;
>     +   uint32_t interpolation;
>     +   uint32_t origin_upper_left;
>     +   uint32_t pixel_center_integer;
>     +   uint32_t explicit_location;
>     +   uint32_t explicit_index;
>     +   uint32_t explicit_binding;
>     +   uint8_t has_initializer;
>     +   int32_t depth_layout;
>     +   uint32_t location_frac;
>     +
>     +   uint32_t num_state_slots;
>     +   struct ir_state_slot *state_slots;
>     +
>     +   uint8_t has_constant_value;
>     +   uint8_t has_constant_initializer;
>     +};
>     +
>     +
>     +/* helper class to read instructions */
>
>
> This comment should be expanded to explain that this class wraps 
> around the posix functions for mapping a file into the process's 
> address space.

Yes, agreed.

>     +struct memory_map
>     +{
>     +public:
>     +   memory_map() :
>     +      fd(0),
>     +      cache_size(0),
>     +      cache_mmap(NULL),
>     +      cache_mmap_p(NULL) { }
>     +
>     +   /* read from disk */
>     +   int map(const char *path)
>
>
> Return value should be a boolean, with true indicating success.

I would prefer int because everywhere else in the code I'm using int.

>     +   {
>     +      struct stat stat_info;
>     +      if (stat(path, &stat_info) != 0)
>     +         return -1;
>     +
>     +      cache_size = stat_info.st_size;
>     +
>     +      fd = open(path, O_RDONLY);
>     +      if (fd) {
>     +         cache_mmap_p = cache_mmap = (char *)
>     +            mmap(NULL, cache_size, PROT_READ, MAP_PRIVATE, fd, 0);
>
>
> Will this work on Windows?

It won't, this is something I have left open for now. For Windows I 
could write alternative that simply reads the whole file to a buffer 
which would work on any system or then try to use a mmap equivalent 
('File mapping object'), don't have much knowledge on win API though.

>     +         return (cache_mmap == MAP_FAILED) ? -1 : 0;
>     +      }
>     +      return -1;
>     +   }
>     +
>     +   /* read from memory */
>     +   int map(const void *memory, size_t size)
>     +   {
>     +      cache_mmap_p = cache_mmap = (char *) memory;
>     +      cache_size = size;
>     +      return 0;
>     +   }
>     +
>     +   ~memory_map() {
>     +      if (cache_mmap) {
>     +         munmap(cache_mmap, cache_size);
>     +         close(fd);
>     +      }
>     +   }
>     +
>     +   /* move read pointer forward */
>     +   inline void ffwd(int len)
>     +   {
>     +      cache_mmap_p += len;
>     +   }
>     +
>     +   /* position of read pointer */
>     +   inline uint32_t position()
>     +   {
>     +      return cache_mmap_p - cache_mmap;
>     +   }
>     +
>     +   inline void read_string(char *str)
>     +   {
>     +      uint32_t len;
>     +      read(&len);
>     +      memcpy(str, cache_mmap_p, len);
>     +      str[len] = '\0';
>     +      ffwd(len);
>
>
> If the string is larger than the caller expects (or the contents of 
> the shader binary are corrupted), this function may overwrite 
> arbitrarily large areas of memory.  Since the shader binary may be 
> supplied by client code (in the case of OES_get_program_binary), this 
> is a security risk.
>
> I can think of two possible ways of fixing this:
>
> 1. Change this function so that it copies the string to a newly 
> allocated chunk of memory.
>
> 2. Change the format of the shader binary so that strings are stored 
> as null-terminated (rather than length followed by string data); that 
> way this function can simply return cache_mmap_p and fast forward to 
> the next byte after the end of the string.
>
> I'd lean in favor of #2.

Thanks, #2 sounds good to me.

>     +   }
>     +
>     +   /* read functions for different types */
>     +#define _READ_TYPE(type) inline void read(type *val) {\
>     +   *val = *(type *) cache_mmap_p;\
>     +   ffwd(sizeof(type));\
>     +}
>     +   _READ_TYPE(int32_t)
>     +   _READ_TYPE(uint32_t)
>     +   _READ_TYPE(bool)
>     +   _READ_TYPE(GLboolean)
>     +   _READ_TYPE(gl_texture_index)
>     +   _READ_TYPE(ir_expression_operation)
>     +
>     +   inline void read(void *dst, size_t size)
>     +   {
>     +      memcpy(dst, cache_mmap_p, size);
>     +      ffwd(size);
>     +   }
>     +
>     +   /* read and serialize a gl_shader */
>     +   inline struct gl_shader *read_shader(void *mem_ctx,
>     +      const char *mesa_sha, size_t size)
>     +   {
>     +      struct gl_shader *sha = _mesa_shader_unserialize(mem_ctx,
>     +         cache_mmap_p, mesa_sha, size);
>     +      ffwd(size);
>     +      return sha;
>     +   }
>
>
> It surprises me to see us fast-forward by a caller-supplied size 
> here.  Shouldn't we encode the size in the first 4 bytes of the shader 
> binary?

This is sort of how it works now, the size is just read on the caller 
side as I've left it open for caller to define the format of the file. 
But it is true that size is found now before the shader so should be 
able to move reading of it here.

>     +
>     +private:
>     +
>     +   int32_t fd;
>     +   int32_t cache_size;
>     +   char *cache_mmap;
>     +   char *cache_mmap_p;
>     +};
>     +
>     +
>     +/* class to read and write gl_shader */
>
>
> This is one of the cases where I need more information in the comment 
> above the class definition.  Is this a class that should be created 
> once per context, and represents a proxy to an on-disk cache of 
> shaders?  Or is it a class that should be created once per load/save 
> of a single shader, to aid in saving/loading the shader?  I'm guessing 
> it's the latter, because prototypes_only seems like the sort of thing 
> that will change from one serialization/deserialization to the next.
>
> Perhaps a clearer name for this class would be something like 
> "ir_serializer"?
>

Yes, it is of the latter form. I will add more documentation and rename 
the class.

> Also, it looks like there's very little crossover between the code 
> needed to serialize a shader and the code needed to deserialize a 
> shader.  Perhaps this should be split into two classes, ir_serializer 
> and ir_deserializer.

This is fine for me. I wanted to keep read and write methods 'close' to 
each other but I agree separation will make it more readable.

>     +struct ir_cache
>     +{
>     +public:
>     +   ir_cache(bool prototypes = false) :
>     +      prototypes_only(prototypes)
>     +   {
>     +      var_ht = hash_table_ctor(0, hash_table_string_hash,
>     +         hash_table_string_compare);
>     +   }
>     +
>     +   ~ir_cache()
>     +   {
>     +      hash_table_call_foreach(this->var_ht, delete_key, NULL);
>     +      hash_table_dtor(this->var_ht);
>     +   }
>     +
>     +   /* serialize gl_shader to memory  */
>     +   char *serialize(struct gl_shader *shader,
>     +      struct _mesa_glsl_parse_state *state,
>     +      const char *mesa_sha, size_t *size);
>
>
> Style nit-pick: the arguments should be lined up in the column after 
> the opening paren, like this:
>

ok, will fix

>    char *serialize(struct gl_shader *shader,
>                    struct _mesa_glsl_parse_state *state,
>                    const char *mesa_sha, size_t *size);
>
>     +
>     +   /* unserialize gl_shader from mapped memory */
>     +   struct gl_shader *unserialize(void *mem_ctx, memory_map &map,
>     +      uint32_t shader_size,
>     +      struct _mesa_glsl_parse_state *state,
>     +      const char *mesa_sha,
>     +      int *error_code);
>     +
>     +   enum cache_error {
>     +      GENERAL_READ_ERROR = -1,
>     +      DIFFERENT_MESA_SHA = -2,
>     +      DIFFERENT_LANG_VER = -3,
>     +   };
>     +
>     +   /**
>     +    * this method is public so that gl_shader_program
>     +    * unserialization can use it when reading the
>     +    * uniform storage
>     +    */
>     +   const glsl_type *read_glsl_type(memory_map &map,
>     +      struct _mesa_glsl_parse_state *state);
>     +
>     +private:
>     +
>     +   /* variables and methods required for serialization */
>     +
>     +   memory_writer blob;
>     +
>     +   bool prototypes_only;
>     +
>     +   /**
>     +    * writes ir_type and instruction dump size as a 'header'
>     +    * for each instruction before calling save_ir
>     +    */
>     +   int save(ir_instruction *ir);
>     +
>     +   int save_ir(ir_variable *ir);
>     +   int save_ir(ir_assignment *ir);
>     +   int save_ir(ir_call *ir);
>     +   int save_ir(ir_constant *ir);
>     +   int save_ir(ir_dereference_array *ir);
>     +   int save_ir(ir_dereference_record *ir);
>     +   int save_ir(ir_dereference_variable *ir);
>     +   int save_ir(ir_discard *ir);
>     +   int save_ir(ir_expression *ir);
>     +   int save_ir(ir_function *ir);
>     +   int save_ir(ir_function_signature *ir);
>     +   int save_ir(ir_if *ir);
>     +   int save_ir(ir_loop *ir);
>     +   int save_ir(ir_loop_jump *ir);
>     +   int save_ir(ir_return *ir);
>     +   int save_ir(ir_swizzle *ir);
>     +   int save_ir(ir_texture *ir);
>     +   int save_ir(ir_emit_vertex *ir);
>     +   int save_ir(ir_end_primitive *ir);
>     +
>     +
>     +   /* variables and methods required for unserialization */
>     +
>     +   struct _mesa_glsl_parse_state *state;
>     +   void *mem_ctx;
>     +
>     +   struct exec_list *top_level;
>     +   struct exec_list *prototypes;
>     +   struct exec_list *current_function;
>
>
> These private data members need to have documentation comments.  In 
> particular, since an exec_list can in principle store anything that 
> derives from exec_node, the comments should explain what kind of data 
> is stored in each list.

ok, more documentation will be added

>     +
>     +   int read_header(struct gl_shader *shader, memory_map &map,
>     +      const char *mesa_sha);
>     +   int read_prototypes(memory_map &map);
>     +
>     +   int read_instruction(struct exec_list *list, memory_map &map,
>     +      bool ignore = false);
>     +
>     +   int read_ir_variable(struct exec_list *list, memory_map &map);
>     +   int read_ir_assignment(struct exec_list *list, memory_map &map);
>     +   int read_ir_function(struct exec_list *list, memory_map &map);
>     +   int read_ir_if(struct exec_list *list, memory_map &map);
>     +   int read_ir_return(struct exec_list *list, memory_map &map);
>     +   int read_ir_call(struct exec_list *list, memory_map &map);
>     +   int read_ir_discard(struct exec_list *list, memory_map &map);
>     +   int read_ir_loop(struct exec_list *list, memory_map &map);
>     +   int read_ir_loop_jump(struct exec_list *list, memory_map &map);
>     +   int read_emit_vertex(struct exec_list *list, memory_map &map);
>     +   int read_end_primitive(struct exec_list *list, memory_map &map);
>     +
>     +   /* rvalue readers */
>     +   ir_rvalue *read_ir_rvalue(memory_map &map);
>     +   ir_constant *read_ir_constant(memory_map &map,
>     +      struct exec_list *list = NULL);
>     +   ir_swizzle *read_ir_swizzle(memory_map &map);
>     +   ir_texture *read_ir_texture(memory_map &map);
>     +   ir_expression *read_ir_expression(memory_map &map);
>     +   ir_dereference_array *read_ir_dereference_array(memory_map &map);
>     +   ir_dereference_record *read_ir_dereference_record(memory_map
>     &map);
>     +   ir_dereference_variable
>     *read_ir_dereference_variable(memory_map &map);
>     +
>     +   /**
>     +    * var_ht is used to store created ir_variables with a
>     unique_key for
>     +    * each so that ir_dereference_variable creation can find the
>     variable
>     +    */
>     +   struct hash_table *var_ht;
>     +
>     +   /**
>     +    * these 2 functions are ~copy-pasta from string_to_uint_map,
>
>
> I think you mean "copy-pasted" :)

yes!

>     +    * we need a hash here with a string key and pointer value
>     +    */
>     +   void hash_store(void * value, const char *key)
>     +   {
>     +      char *dup_key = _mesa_strdup(key);
>     +      bool result = hash_table_replace(this->var_ht, value, dup_key);
>     +      if (result)
>     +         free(dup_key);
>     +   }
>     +
>     +   static void delete_key(const void *key, void *data, void *closure)
>     +   {
>     +      (void) data;
>     +      (void) closure;
>     +      free((char *)key);
>     +   }
>     +
>     +};
>     +#endif /* ifdef __cplusplus */
>     +
>     +#endif /* IR_CACHE_H */
>
>
> Unfortunately, I'm out of time to review this patch series today.  I'm 
> not sure when I'll be able to get back to it.

Thanks for taking the time Paul! I will work the changes you proposed.

// Tapani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131029/c161c14e/attachment-0001.html>