[Mesa-dev] [PATCH 3/4] glsl: add ir_cache class and functions for shader serialization

Thu Oct 31 08:09:42 CET 2013

On 10/29/2013 03:48 PM, Paul Berry wrote:
> On 29 October 2013 00:53, Tapani <tapani.palli at intel.com 
> <mailto:tapani.palli at intel.com>> wrote:
>
>     On 10/28/2013 10:09 PM, Paul Berry wrote:
>>
>>         +static uint32_t _unique_id(ir_variable *var)
>>         +{
>>         +   char buffer[256];
>>         +   _mesa_snprintf(buffer, 256, "%s_%p", var->name, var);
>>         +   return _mesa_str_checksum(buffer);
>>
>>
>>     Two problems:
>>
>>     1. This is a lot of work to obtain a unique identifier.  The
>>     pointer is already unique; we should just use that.
>>
>
>     I was originally using a pointer but then there's the 32 vs 64bit
>     address problem. I wanted to avoid usage of pointers.
>
>
>>     2. There's no guarantee that the checksum of the string will be
>>     unique. Because of the birthday paradox, checksum collisions are
>>     probably going to happen all the time.
>>
>
>     I think there is because the pointer address is unique and is used
>     as part of the string.
>
>
> Yes, but the checksumming process destroys uniqueness (it has to, 
> because the output of the checksum operation is a 32-bit int, and the 
> input is a string.  There are more than 2^32 possible strings, so by 
> the pigeon hole principle at least one checksum must correspond to 
> more than one possible input string).  In the case of 
> _mesa_str_checksum() (which is not a particularly good hash from a 
> cryptographic perspective) the collisions are pretty easy to come by.  
> For example, these were the first two collisions I found by writing a 
> quick test program:
>
>    printf("checksum of \"%s\" is 0x%x\n", "foo_0000003c",
>           _mesa_str_checksum("foo_0000003c"));
>    printf("checksum of \"%s\" is 0x%x\n", "foo_000001a8",
>           _mesa_str_checksum("foo_000001a8"));
>
> checksum of "foo_0000003c" is 0x1360
> checksum of "foo_000001a8" is 0x1360
>

OK, I'll put the address itself back in use then.

>
>
>>     Instead of going to a lot of work to generate a unique id for
>>     each ir_variable, I think we should just serialize the
>>     ir_variable pointer address.  If we are trying to use the same
>>     binary format for 32-bit and 64-bit builds (see my clarifying
>>     question at the top of this email), we'll need to extend the
>>     pointer to 64 bits before serializing it.
>>
>>     Alternatively, if you want to keep the unique ID's small, we'll
>>     have to use some hashtable mechanism to map each variable to its
>>     own ID.
>
>     OK, I will verify if this is a problem. For me it does not seem
>     like a problem as I'm using the address there.
>
>
>>         +}
>>         +
>>         +
>>         +/* data required to serialize ir_variable */
>>         +struct ir_cache_variable_data
>>
>>
>>     As with glsl_type_data, I don't understand why we need to have
>>     this class rather than just adding a serialize() method to the
>>     ir_variable class.
>
>     This could be a serialize function in ir_cache_serialize.cpp. I
>     did not want to touch existing ir classes, I think it helps to
>     have the cache code separated and in one place. This is of course
>     debatable, every ir instruction could have serialize() and
>     unserialize() to deal with this.
>
>
> My chief worry with separating them is that people will forget to 
> update the serialization code when they change the ir class, because 
> the thing they need to update will be far away.  This is particularly 
> a problem when adding new members to the class (something we do quite 
> frequently when implementing new GLSL features).

I understand and I've been worried about this from the maintainability 
point of view as well. I've thought to add some documentation on the 
ir.h file about this.

// Tapani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131031/a3d83399/attachment.html>