[PATCH 4/4] drm/amdgpu: Optimize mutex usage

Tue Jun 13 10:07:46 UTC 2017

Am 13.06.2017 um 08:03 schrieb axie:
> Hi Christian,
>
> Regarding comment 1.
>
> Please read the file idr.c line 24 to line 29.
>
> Let me quote here. Isn't it surprise you? idr_alloc only need read lock.
>
> ...
>
>  * Simultaneous modifications to the @idr are not allowed and should be
>  * prevented by the user, usually with a lock.  idr_alloc() may be called
>  * concurrently with read-only accesses to the @idr, such as 
> idr_find() and
>  * idr_for_each_entry().
>  */
> int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)

As far as I can see you misunderstood the comment.

idr_alloc() can be called concurrently with idr_find() and 
idr_for_each_entry(), so you don't need a lock to cover those when you 
manage the livetime of the mapped pointers with RCU for example.

But you still need to prevent idr_alloc() from being called concurrently 
from multiple threads at the same time, so a read side lock isn't 
sufficient here.

>
> ...
>
> Regarding comment 2.
>
> I have checked relevant information of rw_semaphore. Read lock is as 
> efficient as mutex.
>
> Write lock is less efficient than mutex if "optimistic spinning on 
> lock acquisition" is not configured.
>
> But we only use write lock when destroy bo list. Most of time, as you 
> said, contention does not exist.
>
> So the write lock does not need to wait. So the less efficiency is not 
> a factor compared with mutex.
>
>
> To decide which lock is better, it is more important to choose correct 
> lock mechanism than
>
> the efficiency of the lock itself. For the current bo_list, the 
> multiple readers can block each other.
>
> And what is even worse, the get function can wait for other thing when 
> holding the mutex.

Actually it would be more efficient to not use a lock at all and 
properly reference count the objects instead.

Updates can then be made using RCU.

> If you said that the user mode driver was single thread, we don't need 
> any mutex here then...

No, even when userspace doesn't call functions from multiple thread 
IOCTLs still need to be thread save to the extend that they don't crash 
when you do so anyway.

Regards,
Christian.

>
>
> -Alex Bin Xie
>
>
>
> On 2017-06-12 06:54 PM, Christian König wrote:
>> Am 12.06.2017 um 22:31 schrieb Alex Xie:
>>> Use rw_semaphore instead of mutex for bo_lists.
>>>
>>> In original function amdgpu_bo_list_get, the waiting
>>> for result->lock can be quite long while mutex
>>> bo_list_lock was holding. It can make other tasks
>>> waiting for bo_list_lock for long period too.
>>> Change bo_list_lock to rw_semaphore can avoid most of
>>> such long waiting.
>>>
>>> Secondly, this patch allows several tasks(readers of idr)
>>> to proceed at the same time.
>>>
>>> Signed-off-by: Alex Xie <AlexBin.Xie at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  3 ++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 14 +++++++-------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c     |  3 +--
>>>   3 files changed, 10 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 063fc73..bfc40d7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -35,6 +35,7 @@
>>>   #include <linux/rbtree.h>
>>>   #include <linux/hashtable.h>
>>>   #include <linux/dma-fence.h>
>>> +#include <linux/rwsem.h>
>>>     #include <ttm/ttm_bo_api.h>
>>>   #include <ttm/ttm_bo_driver.h>
>>> @@ -859,7 +860,7 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr 
>>> *mgr);
>>>   struct amdgpu_fpriv {
>>>       struct amdgpu_vm    vm;
>>>       struct amdgpu_bo_va    *prt_va;
>>> -    struct mutex        bo_list_lock;
>>> +    struct rw_semaphore    bo_list_lock;
>>>       struct idr        bo_list_handles;
>>>       struct amdgpu_ctx_mgr    ctx_mgr;
>>>       u32            vram_lost_counter;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> index 4363f28..bfe736e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> @@ -45,10 +45,10 @@ static int amdgpu_bo_list_create(struct 
>>> amdgpu_fpriv *fpriv,
>>>       if (!*result)
>>>           return -ENOMEM;
>>>   -    mutex_lock(&fpriv->bo_list_lock);
>>> +    down_read(&fpriv->bo_list_lock);
>>>       r = idr_alloc(&fpriv->bo_list_handles, *result,
>>>                 1, 0, GFP_KERNEL);
>>> -    mutex_unlock(&fpriv->bo_list_lock);
>>> +    up_read(&fpriv->bo_list_lock);
>>
>> The protection here is incorrect, you need to take the write side 
>> here as well.
>>
>>>         if (r < 0) {
>>>           kfree(*result);
>>> @@ -67,17 +67,17 @@ static void amdgpu_bo_list_destroy(struct 
>>> amdgpu_fpriv *fpriv, int id)
>>>   {
>>>       struct amdgpu_bo_list *list;
>>>   -    mutex_lock(&fpriv->bo_list_lock);
>>> +    down_write(&fpriv->bo_list_lock);
>>>       list = idr_remove(&fpriv->bo_list_handles, id);
>>>       if (list) {
>>>           /* Another user may have a reference to this list still */
>>>           mutex_lock(&list->lock);
>>>           mutex_unlock(&list->lock);
>>> -        mutex_unlock(&fpriv->bo_list_lock);
>>> +        up_write(&fpriv->bo_list_lock);
>>>           amdgpu_bo_list_free(list);
>>>       }
>>>       else {
>>> -        mutex_unlock(&fpriv->bo_list_lock);
>>> +        up_write(&fpriv->bo_list_lock);
>>>       }
>>>   }
>>>   @@ -173,11 +173,11 @@ amdgpu_bo_list_get(struct amdgpu_fpriv 
>>> *fpriv, int id)
>>>   {
>>>       struct amdgpu_bo_list *result;
>>>   -    mutex_lock(&fpriv->bo_list_lock);
>>> +    down_read(&fpriv->bo_list_lock);
>>>       result = idr_find(&fpriv->bo_list_handles, id);
>>>       if (result)
>>>           mutex_lock(&result->lock);
>>> -    mutex_unlock(&fpriv->bo_list_lock);
>>> +    up_read(&fpriv->bo_list_lock);
>>
>> Actually rw_semaphores are a bit less efficient than mutexes and the 
>> contention on this is practically not existent since at least the 
>> open source stack makes all BO list operations from a single thread.
>>
>> So I'm not sure if that is actually more efficient.
>>
>> Christian.
>>
>>>       return result;
>>>   }
>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> index f68ced6..c4cba81 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> @@ -841,7 +841,7 @@ int amdgpu_driver_open_kms(struct drm_device 
>>> *dev, struct drm_file *file_priv)
>>>               goto out_suspend;
>>>       }
>>>   -    mutex_init(&fpriv->bo_list_lock);
>>> +    init_rwsem(&fpriv->bo_list_lock);
>>>       idr_init(&fpriv->bo_list_handles);
>>>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
>>> @@ -900,7 +900,6 @@ void amdgpu_driver_postclose_kms(struct 
>>> drm_device *dev,
>>>           amdgpu_bo_list_free(list);
>>>         idr_destroy(&fpriv->bo_list_handles);
>>> -    mutex_destroy(&fpriv->bo_list_lock);
>>>         kfree(fpriv);
>>>       file_priv->driver_priv = NULL;
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx