[Mesa-dev] Hash Collision Risk Maths with Salt values.

Peter Dolding oiaohm at gmail.com
Wed Mar 15 13:17:16 UTC 2017


Timothy Arceri tarceri at itsqueeze.com
> "Assume the world has 7 billion people and every person is a developer.
>  If every person on Earth adds a new commit every second (our developers  .
> are assumed to never sleep!). At that rate, mankind would need nothing
> less than 6.6 million years to produce a number of commits large enough
> to create a hash collision with 50% probability!

> In a more realistic scenario, a large project might have 1000 active
> developers who commit 10 commits per day. In these circumstances, it
> would take approximately 4×10^17 years for a collision to happen with
> 50% probability. For comparison, the age of the universe is estimated to
> be 13.8×10^9 years."

Sorry I missed this when it was on the Mesa-dev mailing list.

Information right but the maths wrong because you have missed an
effect that applies in the mesa talked about applications of hashs.
The effect is salt from cryptography does and what it does to hash
values.

With SHA1 you start with 160 bits of unique values for items hashed.

Lets say we add a build_id into the mix.   This changes per version.
So you have released 2 different build ids how many unique hashed
values do you have if you place them all in one directory.  The answer
is now 159 of course you have just lost 1 bit of uniqueness.  Or
instead of 6.6 million years only 3.3 million years.

It is luckily linear  so take numbers number of salts factor by 2 take
that number of bits off your hash then calculate your probability of
collision.   So using a 512 bit hash really makes this hard using a
shorter hash getting down into thousand of years might not take that
long at all.

Not putting different salted hashes in the same location also reduces
odds of collision due to avoiding salt effect.   So we give the
build_id and gpu_id  their own directory this means the build_id and
gpu_id salts have not effected the outcome as much. The result is you
still have have all the hash bits of uniqueness.  So build_id and
gpu_id has not resulted in subtracting bits off the hash by giving
them directories.

So build_id and gpu_id values would be salts.

The fastest way to really undermine a hash you are depending being
unique is to be careless with salt effects.   Add enough salt effects
when you run the maths your hash might have no bits left so making a
collision likely.   It is possible to add enough salts to consume up a
512 bit hash example being.  32 salts with 256(16^2) unique values
each you have just made collisions likely with a 512bit hash.    A
hash like SHA1 failure point due to salting  10 salts with 256(16^2)
unique values each.   So each of the individual salts damage to the
hash does not have to be that much but accumulative can be quite a
large problem.

The numbers for git are different because there is no salt in the mathematics.

Salt from cryptography for protecting passwords its really not a
problem if two users end up with the same hash due to using user
unique salts as the user unique salt will be applied when entering
password so only valid password will work.

Salt when you are wanting to use a hash to uniquely identify something
can turn into your absolutely worst enemy.   Particularly if people
keep on over time adding this salt value and that salt value without
knowing they are step by step destroying the uniqueness of the hash.

This is why I wrote about answering on the bugzilla about using
directories.    If you don't salt the hash you have every bit of the
hash for uniqueness so you then keep the odds of some form of
collision low.

Failure to take due care with salting the hash
1 makes collisions random.
2 makes collisions far more likely.
3 if you are really not careful can make them absolutely likely.

I am not worried about something attempting to brute force and cause a
collusion on demand.   Salting a hash does not make any simpler to-do
this.   The odds of brute force happening by staying on current hashes
is insanely low.    Salting hash if this becomes a problem you would
want as something truly system unique not something like a build_id or
gpu_id as well.   So current salts are the wrong kind of salt for this
problem.

What I am more worried about is what is called Birthday attack
happening due to being carelessness with salting you could make
collisions likely because the salts have taken out too many of the
hash bits.      What are you going to-do if mesa ends up over salted
for some reason and the Birthday Attack comes from unlikely to likely
and you now have two different instances wanting to use the same hash
value.   Remember just like I pointed out I can be running more than
one architecture at a time a person could be running more than one gpu
type and .... So you cannot say current running version wins.

Its better to have  policy against multi salting the hashes that goal
is to be unique then wake up and find like 32 different developers
have done it in a small way and resulted in non unique hash.

> Since we check the size of cache items and fallback if they don't match
> what we were expecting this makes the likelyhood of a collision
> happening and the data actually being used even more remote.

This is dealing with a collision event not avoiding.   You are caching
to gain performance and this says we will drop the performance gains
in case of collision because they are rare at this stage.

Good design and policy  will keep hash collisions insanely rare.   Bad
design could make hash collisions insanely likely.   One of the
biggest risk with using hash for uniqueness is accidental poisoning it
with salts from your own code so its no where near as unique as you
are expecting.

Salt effect is commonly forgotten about and it does lead to some very
strange errors at times with people scratching head on how they are
having collisions with hashes where it should be next to impossible.
Yes over salt even the largest hash and you can end up with collisions
that are totally avoidable by correct design.

Peter Dolding


More information about the mesa-dev mailing list