[Mesa-dev] [PATCH 3/4] glsl: add ir_cache class and functions for shader serialization
Tapani Pälli
tapani.palli at intel.com
Thu Oct 31 08:09:42 CET 2013
On 10/29/2013 03:48 PM, Paul Berry wrote:
> On 29 October 2013 00:53, Tapani <tapani.palli at intel.com
> <mailto:tapani.palli at intel.com>> wrote:
>
> On 10/28/2013 10:09 PM, Paul Berry wrote:
>>
>> +static uint32_t _unique_id(ir_variable *var)
>> +{
>> + char buffer[256];
>> + _mesa_snprintf(buffer, 256, "%s_%p", var->name, var);
>> + return _mesa_str_checksum(buffer);
>>
>>
>> Two problems:
>>
>> 1. This is a lot of work to obtain a unique identifier. The
>> pointer is already unique; we should just use that.
>>
>
> I was originally using a pointer but then there's the 32 vs 64bit
> address problem. I wanted to avoid usage of pointers.
>
>
>> 2. There's no guarantee that the checksum of the string will be
>> unique. Because of the birthday paradox, checksum collisions are
>> probably going to happen all the time.
>>
>
> I think there is because the pointer address is unique and is used
> as part of the string.
>
>
> Yes, but the checksumming process destroys uniqueness (it has to,
> because the output of the checksum operation is a 32-bit int, and the
> input is a string. There are more than 2^32 possible strings, so by
> the pigeon hole principle at least one checksum must correspond to
> more than one possible input string). In the case of
> _mesa_str_checksum() (which is not a particularly good hash from a
> cryptographic perspective) the collisions are pretty easy to come by.
> For example, these were the first two collisions I found by writing a
> quick test program:
>
> printf("checksum of \"%s\" is 0x%x\n", "foo_0000003c",
> _mesa_str_checksum("foo_0000003c"));
> printf("checksum of \"%s\" is 0x%x\n", "foo_000001a8",
> _mesa_str_checksum("foo_000001a8"));
>
> checksum of "foo_0000003c" is 0x1360
> checksum of "foo_000001a8" is 0x1360
>
OK, I'll put the address itself back in use then.
>
>
>> Instead of going to a lot of work to generate a unique id for
>> each ir_variable, I think we should just serialize the
>> ir_variable pointer address. If we are trying to use the same
>> binary format for 32-bit and 64-bit builds (see my clarifying
>> question at the top of this email), we'll need to extend the
>> pointer to 64 bits before serializing it.
>>
>> Alternatively, if you want to keep the unique ID's small, we'll
>> have to use some hashtable mechanism to map each variable to its
>> own ID.
>
> OK, I will verify if this is a problem. For me it does not seem
> like a problem as I'm using the address there.
>
>
>> +}
>> +
>> +
>> +/* data required to serialize ir_variable */
>> +struct ir_cache_variable_data
>>
>>
>> As with glsl_type_data, I don't understand why we need to have
>> this class rather than just adding a serialize() method to the
>> ir_variable class.
>
> This could be a serialize function in ir_cache_serialize.cpp. I
> did not want to touch existing ir classes, I think it helps to
> have the cache code separated and in one place. This is of course
> debatable, every ir instruction could have serialize() and
> unserialize() to deal with this.
>
>
> My chief worry with separating them is that people will forget to
> update the serialization code when they change the ir class, because
> the thing they need to update will be far away. This is particularly
> a problem when adding new members to the class (something we do quite
> frequently when implementing new GLSL features).
I understand and I've been worried about this from the maintainability
point of view as well. I've thought to add some documentation on the
ir.h file about this.
// Tapani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131031/a3d83399/attachment.html>
More information about the mesa-dev
mailing list