[Mesa-dev] [PATCH 1/2] radeonsi: add support for an on-disk shader cache

Tue Feb 28 03:16:18 UTC 2017

On 27/02/17 23:20, Marek Olšák wrote:
> On Mon, Feb 27, 2017 at 10:57 AM, Michel Dänzer <michel at daenzer.net> wrote:
>> On 25/02/17 01:56 PM, Timothy Arceri wrote:
>>> On 24/02/17 21:02, Marek Olšák wrote:
>>>> On Fri, Feb 24, 2017 at 3:18 AM, Timothy Arceri
>>>> <tarceri at itsqueeze.com> wrote:
>>>>> On 24/02/17 08:49, Timothy Arceri wrote:
>>>>>> On 24/02/17 05:12, Marek Olšák wrote:
>>>>>>> On Thu, Feb 23, 2017 at 3:09 AM, Timothy Arceri
>>>>>>> <tarceri at itsqueeze.com> wrote:
>>>>>>>>
>>>>>>>> From: kdj0c <kdj0c at djinvi.net>
>>>>>>>>
>>>>>>>> V2 (Timothy Arceri):
>>>>>>>> - when loading from disk cache also binary insert into memory cache.
>>>>>>>> - check that the binary loaded from disk is the correct size. If not
>>>>>>>>   delete the cache item and skip loading from cache.
>>>>>>>> ---
>>>>>>>>  src/gallium/drivers/radeonsi/si_state_shaders.c | 69
>>>>>>>> ++++++++++++++++++++++---
>>>>>>>>  1 file changed, 62 insertions(+), 7 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c
>>>>>>>> b/src/gallium/drivers/radeonsi/si_state_shaders.c
>>>>>>>> index f615aa8..71556f9 100644
>>>>>>>> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
>>>>>>>> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
>>>>>>>> @@ -36,6 +36,9 @@
>>>>>>>>  #include "util/u_memory.h"
>>>>>>>>  #include "util/u_prim.h"
>>>>>>>>
>>>>>>>> +#include "util/disk_cache.h"
>>>>>>>> +#include "util/mesa-sha1.h"
>>>>>>>> +
>>>>>>>>  /* SHADER_CACHE */
>>>>>>>>
>>>>>>>>  /**
>>>>>>>> @@ -182,10 +185,12 @@ static bool si_load_shader_binary(struct
>>>>>>>> si_shader *shader, void *binary)
>>>>>>>>   */
>>>>>>>>  static bool si_shader_cache_insert_shader(struct si_screen *sscreen,
>>>>>>>>                                           void *tgsi_binary,
>>>>>>>> -                                         struct si_shader *shader)
>>>>>>>> +                                         struct si_shader *shader,
>>>>>>>> +                                         bool
>>>>>>>> insert_into_disk_cache)
>>>>>>>>  {
>>>>>>>>         void *hw_binary;
>>>>>>>>         struct hash_entry *entry;
>>>>>>>> +       uint8_t key[CACHE_KEY_SIZE];
>>>>>>>>
>>>>>>>>         entry = _mesa_hash_table_search(sscreen->shader_cache,
>>>>>>>> tgsi_binary);
>>>>>>>>         if (entry)
>>>>>>>> @@ -201,6 +206,12 @@ static bool si_shader_cache_insert_shader(struct
>>>>>>>> si_screen *sscreen,
>>>>>>>>                 return false;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> +       if (sscreen->b.disk_shader_cache && insert_into_disk_cache) {
>>>>>>>> +               _mesa_sha1_compute(tgsi_binary, *((uint32_t
>>>>>>>> *)tgsi_binary), key);
>>>>>>>
>>>>>>>
>>>>>>> What happens if we randomly get a sha1 collision?
>>>>>>
>>>>>>
>>>>>> You should stop playing your game which will be rendering incorrectly
>>>>>> and by a lotto ticket.
>>>>>>
>>>>>>> Shouldn't we store the whole key as well?
>>>>>>
>>>>>>
>>>>>> Sure I can add that, its cheap to check here anyway. Although the other
>>>>>> cache stages rely on a collision being improbable.
>>>>>
>>>>>
>>>>>
>>>>> For some reason I thought the key was simpler than it is. It seems
>>>>> excessive
>>>>> to store and compare the tgsi again. I don't think git even worries
>>>>> about
>>>>> the possibility of a collision and we will be dealing with much smaller
>>>>> amounts of cache items then commits in a git repository.
>>>>>
>>>>> Thoughts?
>>>>
>>>> I'll let others comment on this. If nobody comments, checking only the
>>>> key can stay.
>>>
>>> Seems SVN didn't used to worry about collisions either.
>>>
>>> https://arstechnica.com/security/2017/02/watershed-sha1-collision-just-broke-the-webkit-repository-others-may-follow/
>>
>> What scenario are we concerned about here? Getting a genuine SHA1
>> collision between two shader objects is still as unlikely as it ever was
>> (virtually impossible?), and someone with access to the shader cache
>> files can wreak havoc much more easily than via a hash collision.
>
> I'm concerned about the random scenario where 2 shaders can have the
> same hash. That is, if the user doesn't do anything wrong and nobody
> else corrupts the cache, can we say for sure that a collision will
> never ever happen?

I was concerned about this when I began work on the shader cache but 
after digging around it became fairly clear that we should be able to 
trust the hash to do its job. We are going to be dealing with a cache 
with likely tens of thousands of shaders at most, and we also check the 
size of the cached item is what we were expecting when before using it.

The general answer to the question "Should I worry about collisions in 
sha1?" seems to be that you should be more worried about the earth being 
hit by a giant asteroid.

Or the more colourful note in this git documentation about collisions 
[4]. "A higher probability exists that every member of your programming 
team will be attacked and killed by wolves in unrelated incidents on the 
same night."

I'm no expert on secure hashs but I'll point out a few resources that 
I've come across.

In his response to upgrading git to use a different hash following the 
google post, Linus responed.

"I haven't seen the attack yet, but git doesn't actually just hash the
data, it does prepend a type/length field to it. That usually tends to
make collision attacks much harder, because you either have to make
the resulting size the same too, or you have to be able to also edit
the size field in the header.

pdf's don't have that issue, they have a fixed header and you can
fairly arbitrarily add silent data to the middle that just doesn't get
shown.

So pdf's make for a much better attack vector, exactly because they
are a fairly opaque data format."

Since we check the size of cache items and fallback if they don't match 
what we were expecting this makes the likelyhood of a collision 
happening and the data actually being used even more remote.

And without the size matching requirement and a know attack vector 
googles computation still took [3]:

"over 9,223,372,036,854,775,808 SHA1 computations. This took the 
equivalent processing power as 6,500 years of single-CPU computations 
and 110 years of single-GPU computations."

[3] Is a blog post that takes a look at
how hard it would be to get a collision in git.

Summary:

"Assume the world has 7 billion people and every person is a developer. 
If every person on Earth adds a new commit every second (our developers 
are assumed to never sleep!). At that rate, mankind would need nothing 
less than 6.6 million years to produce a number of commits large enough 
to create a hash collision with 50% probability!

In a more realistic scenario, a large project might have 1000 active 
developers who commit 10 commits per day. In these circumstances, it 
would take approximately 4×10^17 years for a collision to happen with 
50% probability. For comparison, the age of the universe is estimated to 
be 13.8×10^9 years."

[5]

[1] https://marc.info/?l=git&m=148787047422954&w=2
[2] https://shattered.io/
[3] https://diego.assencio.com/?index=eacd6eedf46c9dd596a5f12221ad15b8
[4] https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection

>
> The patch can land as-is, but the topic is still open.
>
> Marek
>