[Mesa-dev] [PATCH 7/7] util/disk_cache: also write key_blob to cache entry

Thu Mar 16 12:40:21 UTC 2017

On Thu, Mar 16, 2017 at 3:28 AM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
> On 16/03/17 10:09, Grazvydas Ignotas wrote:
>>
>> This can be used to deal with key hash collisions from different
>> version (should we find that to actually happen) and to find
>> which mesa version produced it the entry.
>>
>> Signed-off-by: Grazvydas Ignotas <notasas at gmail.com>
>> ---
>>  src/util/disk_cache.c | 24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/util/disk_cache.c b/src/util/disk_cache.c
>> index ad591be..8ac9e48 100644
>> --- a/src/util/disk_cache.c
>> +++ b/src/util/disk_cache.c
>> @@ -766,6 +766,7 @@ destroy_put_job(void *job, int thread_index)
>>  struct cache_entry_file_data {
>>     uint32_t crc32;
>>     uint32_t uncompressed_size;
>> +   uint32_t key_blob_size;
>>  };
>>
>>  static void
>> @@ -839,6 +840,7 @@ cache_put(void *job, int thread_index)
>>     struct cache_entry_file_data cf_data;
>>     cf_data.crc32 = util_hash_crc32(dc_job->data, dc_job->size);
>>     cf_data.uncompressed_size = dc_job->size;
>> +   cf_data.key_blob_size = dc_job->cache->key_blob_size;
>>
>>     size_t cf_data_size = sizeof(cf_data);
>>     ret = write_all(fd, &cf_data, cf_data_size);
>> @@ -847,6 +849,16 @@ cache_put(void *job, int thread_index)
>>        goto done;
>>     }
>>
>> +   /* Write the key_blob, this can be used find information about the
>> +    * mesa version that produced the entry or deal with hash collisions,
>> +    * should that ever become a real problem.
>> +    */
>> +   ret = write_all(fd, dc_job->cache->key_blob,
>> dc_job->cache->key_blob_size);
>> +   if (ret == -1) {
>> +      unlink(filename_tmp);
>> +      goto done;
>> +   }
>> +
>>     /* Now, finally, write out the contents to the temporary file, then
>>      * rename them atomically to the destination filename, and also
>>      * perform an atomic increment of the total cache size.
>> @@ -859,7 +871,7 @@ cache_put(void *job, int thread_index)
>>     }
>>     rename(filename_tmp, filename);
>>
>> -   file_size += cf_data_size;
>> +   file_size += cf_data_size + dc_job->cache->key_blob_size;
>>     p_atomic_add(dc_job->cache->size, file_size);
>>
>>   done:
>> @@ -966,8 +978,16 @@ disk_cache_get(struct disk_cache *cache, const
>> cache_key key, size_t *size)
>>           goto fail;
>>     }
>>
>> +   if (cf_data.key_blob_size != cache->key_blob_size)
>> +      goto fail;
>> +
>> +   /* Right now we don't use the key_blob, just skip it */
>> +   ret = lseek(fd, cache->key_blob_size, SEEK_CUR);
>> +   if (ret == -1)
>> +      goto fail;
>> +
>>     /* Load the actual cache data. */
>> -   size_t cache_data_size = sb.st_size - cf_data_size;
>> +   size_t cache_data_size = sb.st_size - cf_data_size -
>> cache->key_blob_size;
>>     for (len = 0; len < cache_data_size; len += ret) {
>>        ret = read(fd, data + len, cache_data_size - len);
>>        if (ret == -1)
>>
>
> I'm not sure how much I like writing this as a blob. It makes reading the
> header difficult.

I was thinking how to make it parsable and decided to make it an
opaque blob. You see, this is becoming kind of an user ABI that will
be difficult to maintain without breaking existing tools because
things are rapidly changing in projects like mesa. We don't know what
will be needed in future and if those 3 strings you listed below will
be enough, not to mention driver differences. For example, it might
easily turn out that shaders also have a kernel dependency (kernel
enables some feature in GPU needed for shader to run), or there is
some unexpected dependency on libdrm or something, and we need to take
their version too. What will we do, add a few more version strings?
But the tools only look at first 3. Concatenate? Then why have mesa
and llvm separate?

I guess the tools could use conventional means to to find the metadata
they need (GPU name, etc) through GL strings, set the cache directory
and compile a test shader to get the version blob and use that to
identify other entries to collect. Otherwise if you think it must be
strings, I'd say it should be just 2 strings, combined version of
everything relevant and gpu_name, separate mesa+llvm does not suit all
drivers and is likely not future proof anyway.

Gražvydas

> For example radeonsi will contain an llvm_id string while
> i965 will not. Can we not just define a cache_key struct to be passed to
> disk_cache_create().
>
> struct cache_driver_keys {
>    char *mesa_id;
>    char *llvm_id;
>    char *gpu_id;
> }
>
> It does mean we can't just use cache_entry_file_data for these strings, but
> that's not a big deal. We do need to a least write 0 when llvm_id is NULL.
>
> As well as those 3 we would probably want to add ptr size to
> cache_entry_file_data.