[PATCH] dma-fence: fix dma_fence_get_rcu_safe

Mon Sep 11 08:50:40 UTC 2017

Sorry for the delayed response, but your mail somehow ended up in the 
Spam folder.

Am 04.09.2017 um 15:40 schrieb Chris Wilson:
> Quoting Christian König (2017-09-04 14:27:33)
>> From: Christian König <christian.koenig at amd.com>
>>
>> The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
>> acquire a reference it doesn't necessary mean that there is no fence at all.
>>
>> It usually mean that the fence was replaced by a new one and in this situation
>> we certainly want to have the new one as result and *NOT* NULL.
> Which is not guaranteed by the code you wrote either.
>
> The point of the comment is that the mb is only inside the successful
> kref_atomic_inc_unless_zero, and that only after that mb do you know
> whether or not you have the current fence.
>
> You can argue that you want to replace the
> 	if (!dma_fence_get_rcu())
> 		return NULL
> with
> 	if (!dma_fence_get_rcu()
> 		continue;
> but it would be incorrect to say that by simply ignoring the
> post-condition check that you do have the right fence.

You are completely missing the point here.

It is irrelevant if you have the current fence or not when you return. 
You can only guarantee that it is the current fence when you take a look 
and that is exactly what we want to avoid.

So the existing code is complete nonsense. Instead what we need to 
guarantee is that we return *ANY* fence which we can grab a reference for.

See the usual life of a fence* variable looks like this:
1. assigning pointer to fence A;
2. assigning pointer to fence B;
3. assigning pointer to fence C;
....

When dma_fence_get_rcu_safe() is called between step #1 and step #2 for 
example it is perfectly valid to just return either fence A or fence B.

But it is invalid to return NULL because that suggests that we don't 
need to sync at all.

Regards,
Christian.