[PATCH 1/4] drm/amdgpu: Optimization of critical section

Tue Jun 13 10:23:00 UTC 2017

Am 13.06.2017 um 08:29 schrieb axie:
> On 2017-06-12 07:00 PM, Christian König wrote:
>> Am 12.06.2017 um 22:31 schrieb Alex Xie:
>>> This patch is to move a loop of unref BOs and
>>> several memory free function calls out of
>>> critical sections.
>>>
>>> Signed-off-by: Alex Xie <AlexBin.Xie at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> index a664987..02c138f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> @@ -75,9 +75,12 @@ static void amdgpu_bo_list_destroy(struct 
>>> amdgpu_fpriv *fpriv, int id)
>>>           /* Another user may have a reference to this list still */
>>>           mutex_lock(&list->lock);
>>>           mutex_unlock(&list->lock);
>>> +        mutex_unlock(&fpriv->bo_list_lock);
>>>           amdgpu_bo_list_free(list);
>>>       }
>>> -    mutex_unlock(&fpriv->bo_list_lock);
>>> +    else {
>>> +        mutex_unlock(&fpriv->bo_list_lock);
>>> +    }
>>
>> You could move the unlock of bo_list_lock even before the if.
>>
>> But since you pointed it out there is quite a bug in this function:
>>>                 /* Another user may have a reference to this list 
>>> still */
>>>                 mutex_lock(&list->lock);
>>>                 mutex_unlock(&list->lock);
>> Not sure if that is still up to date, but that use case used to be 
>> illegal with mutexes.
> As I understand this piece of code, these mutex_lock and mutex_unlock 
> pair are used to make sure all other tasks have finished
> access of the bo list. Another side of this story is in function 
> amdgpu_bo_list_get. These two piece of codes together make sure
> we can safely destroy bo list.

Yeah, the idea behind the code is correct. But using mutexes in that way 
is illegal, see here https://lwn.net/Articles/575460/.

I'm not sure if that is still up to date, but in ancient times you 
needed to avoid patterns like this:
mutex_unlock(&obj->lock);
kfree(obj);

Anyway I suggest we just replace the whole bo_list handling with and RCU 
and refcount based implementation. That should avoid the whole locking 
for the read only code path.

Regards,
Christian.

>
> Otherwise we can easily simplify these lockings.
> Let me give an example here.
>
> In function amdgpu_bo_list_get, assuming we change the code like this:
> ...
>     down_read(&fpriv->bo_list_lock);
>     result = idr_find(&fpriv->bo_list_handles, id);
>     up_read(&fpriv->bo_list_lock);
>     /**Line 1. Task A was scheduled away from CPU**/
>     if (result)
>         mutex_lock(&result->lock);
> ...
>
> In function amdgpu_bo_list_destroy, assuming we change the code like 
> this:
> ...
>     down_write(&fpriv->bo_list_lock);
>     list = idr_remove(&fpriv->bo_list_handles, id);
>     up_write(&fpriv->bo_list_lock);
>     if (list) {
>         /* Another user may have a reference to this list still */
>         mutex_lock(&list->lock);
>         mutex_unlock(&list->lock);
>         amdgpu_bo_list_free(list);
>     }
> }
>
> When task A is running in function amdgpu_bo_list_get in line 1, CPU 
> scheduler takes CPU away from task A.
> Then Task B run function amdgpu_bo_list_destroy. Task B can run all 
> the way to destroy and free mutex.
> Later Task A is back to run. The mutex result->lock has been destroyed 
> by task B. Now task A try to lock a mutex
> which has been destroyed and freed.
>
>
>>
>> Christian.
>>
>>>   }
>>>     static int amdgpu_bo_list_set(struct amdgpu_device *adev,
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx