[Mesa-dev] [v2 3/6] glsl: ir_serializer class for the shader cache

Tue Nov 5 23:45:42 PST 2013

On 11/05/2013 07:36 PM, Paul Berry wrote:
> On 1 November 2013 02:16, Tapani Pälli <tapani.palli at intel.com 
> <mailto:tapani.palli at intel.com>> wrote:
>
>     ir_serializer can serialize a gl_shader structure as binary
>     data, will be used by OES_get_program_binary implementation.
>     +   uint8_t sampler_array = type->sampler_array;
>     +   uint8_t sampler_type = type->sampler_type;
>     +   uint8_t interface_packing = type->interface_packing;
>     +
>     +   blob.write_uint32(&base_type);
>
>
> There's no need for base_type to be 32 bits.  It's a tiny enum.

Sure, will change.

>     +   blob.write_uint32(&length);
>     +   blob.write_uint8(&vector_elms);
>     +   blob.write_uint8(&matrix_cols);
>     +   blob.write_uint8(&sampler_dimensionality);
>     +   blob.write_uint8(&sampler_shadow);
>     +   blob.write_uint8(&sampler_array);
>     +   blob.write_uint8(&sampler_type);
>     +   blob.write_uint8(&interface_packing);
>
>
> Two comments about this:
>
> 1. Having write_uint8 and write_uint32 take their arguments as 
> pointers is surprising, and it makes code like the above needlessly 
> verbose.
>
> I'd recommend changing write_uint8 and write_uint32 take their 
> arguments as values; then we can just write:
>
> blob.write_uint32(type->base_type);
> blob.write_uint32(type->length);
> blob.write_uint8(vector_elms);
> ...and so on.

Yes, can be changed. Using a pointer is kind of a leftover from 'make it 
look like fwrite' obsession I had.

> 2. I'm not sure it makes sense to serialize all of these fields of 
> glsl_type.  Really there are only 4 kinds of GLSL types: built-in 
> types, structs, interfaces, and arrays.  Most of the fields above only 
> matter for built-in types.  And for built-in types, we don't want to 
> serialize the entire glsl_type structure; we simply want to record 
> which built-in type is being referred to so that when the shader is 
> loaded later, we can look up one of the existing flyweights.
>

There are user defined structures and arrays or arrays of user defined 
structures, for reconstructing those most (if not all) of this data is 
needed. For built-in types I agree, that could be optimized to be much 
smaller and faster, I'm not sure if there is a way to detect if the type 
is a builtin type currently but it could be added.

> I have additional thoughts further below about saving GLSL types that 
> would make it unnecessary to save built-in types at all.
>
>     +
>     +   if (type->base_type == GLSL_TYPE_ARRAY)
>     +      save_glsl_type(blob, type->element_type());
>     +   else if (type->base_type == GLSL_TYPE_STRUCT ||
>     +      type->base_type == GLSL_TYPE_INTERFACE) {
>     +      glsl_struct_field *field = type->fields.structure;
>     +      for (unsigned k = 0; k < length; k++, field++) {
>     +         blob.write_string(field->name);
>     +         save_glsl_type(blob, field->type);
>     +         uint8_t row_major = field->row_major;
>     +         uint8_t interpolation = field->interpolation;
>     +         uint8_t centroid = field->centroid;
>     +         blob.write_uint8(&row_major);
>     +         blob.write_int32(&field->location);
>     +         blob.write_uint8(&interpolation);
>     +         blob.write_uint8(&centroid);
>
>
> I'd like to make one more argument in favor of having the 
> serialization and deserialization functions in the classes 
> themselves.  If we did so, then we could have a glsl_type::serialize() 
> function as well as a glsl_struct_field::serialize() function.  Then 
> the code above would collapse into simply:
>
> else if(type->base_type == GLSL_TYPE_STRUCT ||
>         type->base_type == GLSL_TYPE_INTERFACE) {
>    for (unsigned k = 0; k < length; k++)
>       type->fields.structure[k].serialize(blob);
> }

Yes, I think I have bought this idea already. I'll refactor the code as 
methods of IR but then it would be nice also to modify IR classes 
themselves (which I was asked not to do when starting the work ...). For 
example group variables and bitfields as a nice data section of its own 
that can be dumped at one go without cache needing to know about the 
exact types of these variables.

>     +      }
>     +   }
>     +
>     +   ir_len = blob.position() - start_pos - sizeof(ir_len);
>     +   blob.overwrite(&ir_len, sizeof(ir_len), start_pos);
>     +   return 0;
>     +}
>     +
>     +
>     +/**
>     + * Function to create an unique string for a ir_variable. This is
>     + * used by variable dereferences to indicate the exact ir_variable
>     + * when deserialization happens.
>
>
> I still don't understand why we don't just use the pointer for this 
> purpose.  It's unique, and it takes up much less storage than 
> <name>_<decimal address>.

We need to construct a unique name to refer to when reading so this is 
constructed here already. It could be also just a counter like is used 
somewhere else (to have assignment_tmp at 1, assignment_tmp at 2) but I wanted 
to keep this code minimal as it's just unique naming.

>     + */
>     +static char *_unique_name(ir_variable *var)
>     +{
>     +   return ralloc_asprintf(NULL,
>     +      "%s_%ld", var->name ? var->name : "parameter", (intptr_t) var);
>     +}
>     +
>     +
>     +int
>     +ir_serializer::save_ir(ir_variable *ir)
>     +{
>     +   /* name can be NULL, see ir_print_visitor for explanation */
>     +   const char *non_null_name = ir->name ? ir->name : "parameter";
>     +   char *uniq = _unique_name(ir);
>     +
>     +   save_glsl_type(blob, ir->type);
>
>
> It seems really wasteful to have the save_ir() functions for 
> ir_variable, ir_constant, ir_expression, ir_function_signature, and 
> ir_texture all call save_glsl_type().  The fact is that in any given 
> GLSL program, there is a small set of types that are used over and 
> over again, and they are mostly built-in types. Calling 
> save_glsl_type() everywhere a type appears in the IR is going to cause 
> an enormous amount of bloat in the file size.
>
> Here's an alternative idea:
>
> 1. While serializing, keep track of a hashable that maps each 
> glsl_type to a small integer.  Instead of storing the entire type when 
> we're serializing, just store the small integers.  While serializing, 
> if we encounter a type that's not in the hash table, assign it to the 
> next available small integer.
>
> 2. Preload this hashtable with a fixed static mapping for built-in 
> types (e.g. 0=float, 1=vec2, 2=vec3, 3=vec4, 4=int, and so on).
>
> 3. After serializing the IR for the shader, go through the hashtable 
> and serialize all of the non-built-in types with their associated 
> integers.
>
> 4. When deserializing, read the hashtable mapping integers to types 
> first.  Then, while deserializing the IR, each integer can just be 
> looked up in the hashtable to find the associated glsl_type.
>
> This should cut dramatically down on storage size, which should make 
> the loading process much faster.  It also avoids the need to store the 
> built-in types at all.

This sounds good to me, this hash could be part of the 'prototypes' section.

8<

>     +   blob.write_uint8(&has_constant_value);
>     +   blob.write_uint8(&has_constant_initializer);
>
>
> It seems backwards that in the in-memory representation (the 
> ir_variable class) we pack things like read_only, centroid, invariant, 
> etc. tightly into bitfields, but we serialize it out to a byte-aligned 
> data structure. Usually people try to make the serialized data format 
> more tightly packed than the in-memory format, because the tradeoffs 
> for serialized data weigh more strongly in favor of packing (access 
> speed is less important; small footprint is more important).
>
> I think we should consider ways to maintain the tight bitfield packing 
> in the serialized data format.  Here's one idea: how about if we 
> refactor ir_variable so that it contains a substructure called "bits", 
> which contains all the fields that are simple integers, packed 
> carefully together.  Then, here in the serialization function, we just 
> write out the "bits" data structure verbatim using a single call to 
> memory_writer::write().
>
> In addition to making the file size smaller and the 
> serialization/deserialization code simpler and faster, this would have 
> the advantage that if we add new fields to ir_variable::bits in the 
> future, we won't have to go to any effort to ensure that they will be 
> serialized correctly.  The right thing will just happen automatically.

I'm all for doing this and this is what I was proposing when discussing 
the cache implementation some months ago with Ian and Kenneth. Back then 
the discussion was to not modify IR contents but rather have a working 
implementation first and then optimize later, but maybe I've reached the 
point that now :) Ian, could you comment on this? Having a separate data 
structure within each instruction would allow also to use some helper 
data serialization libraries if wanted.

>     +
>     +   for (unsigned i = 0; i < ir->num_state_slots; i++) {
>     +  blob.write_int32(&ir->state_slots[i].swizzle);
>
>
> Swizzles are unsigned 8-bit values.  This should be write_uint8.

OK, maybe the struct could also be changed to have only 8 bits instead 
of a int then.

>     +      for (unsigned j = 0; j < 5; j++) {
>     + blob.write_int32(&ir->state_slots[i].tokens[j]);
>
>
> Tokens are always either small integers or references to the 
> gl_state_index_ enum.  So these can be stored as uint8's.  (It might 
> be worth adding an assertion just to be on the safe side though).

This is of type int in the state_slots structure also.

>     +      }
>     +   }
>     +
>     +   CACHE_DEBUG("save ir_variable [%s] [%s]\n", non_null_name, uniq);
>     +
>     +   ralloc_free(uniq);
>     +
>     +   if (ir->constant_value)
>     +      if (save(ir->constant_value))
>     +         return -1;
>     +
>     +   if (ir->constant_initializer)
>     +      if (save(ir->constant_initializer))
>     +         return -1;
>     +
>     +   uint8_t has_interface_type =
>     +      ir->is_interface_instance() ? 1 : 0;
>     +
>     +   blob.write_uint8(&has_interface_type);
>     +
>     +   if (has_interface_type)
>     +      save_glsl_type(blob, ir->get_interface_type());
>
>
> We talked about this some previously, but I really think the 
> convention of having these serialization functions return "int" (with 
> 0 indicating success) is going to cause maintenance problems for us.  
> Here's my reasoning:
>
> 1. Since 0 means false we have to do mental gymnastics every time we 
> see something like:
>
> if (save(ir->constant_value))
>    return -1;
>
> to remind ourselves that the branch is taken on failure, not success.
>
> 2. If we really are trying to plan for a future where these ints can 
> be return codes instead of success/failure indicators, then doing this:
>
> if (save(ir->constant_value))
>    return -1;
>
> won't work anyway.  We'd have to do something like this:
>
> int retval = save(ir->constant_value));
> if (retval)
>    return retval;
>
> 3. It's going to be very hard to remember to check the return values 
> on these functions.  You yourself missed checking the return value here:
>
> if (has_interface_type)
>    save_glsl_type(blob, ir->get_interface_type());
>
> 4. All of this is speculative future-proofing anyway. The only places 
> in ir_cache_serializer that I can find that would actually generate a 
> nonzero return code are: (a) if we find an unrecognized ir_type (would 
> be better handled by an assertion, or a pure virtual function, as I 
> note below) and (b) if memory allocation fails in 
> memory_writer::grow() (which I thought we agreed to remove).
>
> I recommend ripping out all of these integer return types and 
> returning void.  If, in the future, there are any legitimate error 
> conditions that we want to track, we should add an error state to 
> memory_writer (or possibly ir_serializer) and then check it at the end 
> of serialization.  That's the technique we use for handling parsing 
> and compilation errors elsewhere in Mesa, and it's served us very well.

Removing these is sort of ok for me at this point, what it does is that 
it moves these possible serialization errors to happen when binary is 
read during deserialization. Good thing with having these -1's here is 
that it will catch of the failure early so we never will even attempt to 
parse this. This was quite useful during development as I was simply 
leaving out shaders that cache could not handle and this could be done 
in a very fine-grained way. As imaginary example some shader with tricky 
expression was left out but all rest other shaders were succesfully 
cached. I'm not sure if we should try to make cache 100% proof or allow 
it to bypass those shaders that are not simply supported by the cache 
yet, for example because of some new added feature?

>     +int
>     +ir_serializer::save_ir(ir_expression *ir)
>     +{
>     +   int32_t num_operands = ir->get_num_operands();
>     +   uint32_t exp_operation = ir->operation;
>     +
>     +   save_glsl_type(blob, ir->type);
>
>
> Is it necessary to save the glsl_type for ir_expression nodes?  
> ir_expression has constructors which can infer the type based on the 
> type of the arguments.

I've thought so far it is, it is the type that results from the 
expression but I can change it to use such constructor also.

>     +
>     +   blob.write_uint32(&exp_operation);
>     +   blob.write_int32(&num_operands);
>     +
>     +   /* operand ir_type below is written to make parsing easier */
>     +   for (unsigned i = 0; i < ir->get_num_operands(); i++) {
>     +      uint32_t ir_type = ir->operands[i]->ir_type;
>     +      blob.write_uint32(&ir_type);
>     +      if (save(ir->operands[i]))
>     +         return -1;
>     +   }
>     +   return 0;
>     +}
>     +
>     +
>     +int
>     +ir_serializer::save_ir(ir_function *ir)
>     +{
>     +   uint32_t sig_amount = 0;
>
>
> Can we rename this to "num_signatures"?  "amount" usualy implies a 
> quantity that you don't measure by counting.

ok

>     +ir_serializer::save_ir(ir_function_signature *ir)
>     +{
>     +   int32_t par_count = 0;
>
>
> Can we rename this to something like "param_count"? It's not obvious 
> what "par" is an abbreviation for.

ok

>     +   if (prototypes_only)
>     +      return 0;
>
>
> I'm a little concerned about this.  If we ever make a mistake where 
> prototypes_only doesn't match up properly between the serializer and 
> the deserializer, the deserializer will get hopelessly lost.
>
> We could easily fix this by outputting a body_size of 0 whenever 
> prototypes_only is true.

I don't think such mismatch would happen but I can change this.

>     +
>     +   /* function body */
>     +   foreach_iter(exec_list_iterator, iter, ir->body) {
>     +      ir_instruction *const inst = (ir_instruction *) iter.get();
>     +      CACHE_DEBUG("   body instruction node type %d\n",
>     inst->ir_type);
>     +      if (save(inst))
>     +         return -1;
>     +   }
>     +
>     +   return 0;
>     +}
>     +
>     +
>     +int
>     +ir_serializer::save_ir(ir_assignment *ir)
>     +{
>     +   uint32_t write_mask = ir->write_mask;
>     +
>     +   blob.write_uint32(&write_mask);
>     +
>     +   /* lhs (ir_deference_*) */
>     +   uint32_t lhs_type = ir->lhs->ir_type;
>     +   blob.write_uint32(&lhs_type);
>
>
> Isn't this redundant?  The call to save(ir->lhs) just below will 
> resolve to ir_serializer::save(ir_instruction *), which writes out the 
> ir_type as its first action.

These 'extra ir_type writes' are done to make parsing happen so that 
each instruction knows what type it's going to get and can call 
different function for parsing. I've added a 'read_rvalue' though to I 
think most of these cases could be removed now. I will investigate if 
there are still cases where the type is required to be known by some 
instruction.

>     +
>     +   if (save(ir->lhs))
>     +      return -1;
>     +
>     +   if (ir->condition) {
>     +      CACHE_DEBUG("%s: assignment has condition, not supported",
>     __func__);
>
>
> Why don't we support this?

I believe this just never triggered for me, maybe because of 
optimizations took the conditions away. Support can be added and 
verified when I have such a shader.

>     +ir_serializer::save_ir(ir_swizzle *ir)
>     +{
>     +   uint32_t components = ir->mask.num_components;
>     +   const uint32_t mask[4] = {
>     +      ir->mask.x,
>     +      ir->mask.y,
>     +      ir->mask.z,
>     +      ir->mask.w
>     +   };
>     +   uint32_t val_type = ir->val->ir_type;
>     +
>     +   blob.write_uint32(&components);
>     +   blob.write(&mask, 4 * sizeof(mask[0]));
>
>
> Why not just serialize the ir_swizzle_mask data structure in raw 
> form?  It will take up less space and be faster.

Sure, can be changed. I can see now that the structure itself has also 
changed since this code was written.

>     +   blob.write_int32(&op);
>     +   blob.write_uint8(&has_coordinate);
>     +   blob.write_uint8(&has_projector);
>     +   blob.write_uint8(&has_shadow_comp);
>     +   blob.write_uint8(&has_offset);
>
>
> This would be another good candidate for creating a "bits" 
> substructure, as I recommended with ir_variable.

Agreed.

>     +   blob.write_uint8(&has_dpdy);
>
>
> There are a number of places in the serializer where we have to go to 
> extra effort to serialize out a 1 or 0 to indicate whether an rvalue 
> is present, and then only serialize the rvalue if it is indeed present.
>
> How about if instead we modify ir_serializer::save(ir_instruction *) 
> so that if it's passed a NULL pointer, it simply serializes out a 
> magic ir_type value that means "NULL"?  (We could use ir_type_unset, 
> which is otherwise unused by the IR).  This would reduce our risk of 
> bugs because there would be no danger of forgetting to include this 
> logic in a place where an ir_rvalue is allowed to be NULL.

I'm ok with this proposal. I've had this change in my mind too but I was 
a bit afraid it would make error situation detection header for the 
parser code. I will think this through, I agree it would make the code 
much more readable also.

>     +
>     +   foreach_iter(exec_list_iterator, iter, ir->else_instructions) {
>     +      ir_instruction *const inst = (ir_instruction *) iter.get();
>     +      CACHE_DEBUG("   ir_if else instruction node type %d\n",
>     inst->ir_type);
>     +      if (save(inst))
>     +         return -1;
>     +   }
>
>
> I've seen a few places where this loop is repeated. How about if we 
> make a function whose job is to serialize an exec_list of 
> ir_instructions (including serializing its length)?

Agreed, good idea.

>     +
>     +int ir_serializer::save_ir(ir_loop_jump *ir)
>     +{
>     +   uint32_t mode = ir->mode;
>     +   return blob.write_uint32(&mode);
>
>
> mode is a tiny enum and should be serialized as a uint8.

will fix

>     +   blob.write_uint32(&ir_type);
>
>
> ir_type is a small enum and should be serialized as a uint8.

will fix

>     +   case ir_type_emit_vertex:
>     +      SAVE_IR(ir_emit_vertex);
>     +      break;
>     +   case ir_type_end_primitive:
>     +      SAVE_IR(ir_end_primitive);
>     +      break;
>     +
>     +   default:
>     +      CACHE_DEBUG("%s: error, type %d not implemented\n",
>     +                  __func__, ir->ir_type);
>     +      return -1;
>     +   }
>
>
> Another advantage of moving the serialization logic into the IR 
> classes is that we could do this dispatch by a pure virtual function 
> rather than a switch statement. Then there would be no need to have a 
> default case to do error handling if we encounter an unexpected 
> ir_type. Instead, the compiler would check that each type of IR had a 
> corresponding implementation of the pure virtual.

True, there is no need for this then.

>     +_dump_bool(bool value, memory_writer &blob)
>     +{
>     +   uint8_t val = value;
>     +   blob.write_uint8(&val);
>     +}
>
>
> This seems like it should be in the memory_writer class as
>
> void memory_writer::write_bool(bool value);

True, I will add this there.

>     +   _dump_bool(state->ARB_gpu_shader5_warn, blob);
>     +   _dump_bool(state->AMD_vertex_shader_layer_enable, blob);
>     +   _dump_bool(state->AMD_vertex_shader_layer_warn, blob);
>     + _dump_bool(state->ARB_shading_language_420pack_enable, blob);
>     + _dump_bool(state->ARB_shading_language_420pack_warn, blob);
>     +   _dump_bool(state->EXT_shader_integer_mix_enable, blob);
>     +   _dump_bool(state->EXT_shader_integer_mix_warn, blob);
>
>
> This is another case where I'm concerned about us forgetting to update 
> this code when we add new extension flags.  I think this would be a 
> good candidate for the "bits" idea that I talked about above with 
> ir_variable. Although in this case I might call the structure 
> something like "enables" rather than "bits".

Yes, I'll make a separate patch to get them as a structure that is easy 
to serialize.

> +   CACHE_DEBUG("write %d prototypes\n", total);
>
>     +
>     +   foreach_list_const(node, shader->ir) {
>     +      ir_instruction *const inst = (ir_instruction *) node;
>     +      if (inst->as_variable())
>     +         if (save(inst))
>     +            goto write_errors;
>     +   }
>     +
>     +   foreach_list_const(node, shader->ir) {
>     +      ir_instruction *const inst = (ir_instruction *) node;
>     +      if (inst->as_function())
>     +         if (save(inst))
>     +            goto write_errors;
>     +   }
>
>
> Why is it necessary to save the variables and instructions first?

This is because during parsing we might encounter a call to a function 
that does not exist yet, same goes for variable references. Another way 
would be to modify the reading side so that it makes 2 passes over the 
data but I took this way as originally reader did not use mmap so it was 
just simpler.

Thanks for the review;

// Tapani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131106/46dad71e/attachment-0001.html>