[PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

Christian König ckoenig.leichtzumerken at gmail.com
Wed Aug 29 15:00:58 UTC 2018


Am 29.08.2018 um 16:51 schrieb Michel Dänzer:
> On 2018-08-29 10:57 a.m., Christian König wrote:
>> Am 29.08.2018 um 09:52 schrieb Michel Dänzer:
>>> On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
>>>> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
>>>>> On 2018-08-22 9:52 a.m., Huang Rui wrote:
>>>>>> The new bulk moving functionality is ready, the overhead of moving
>>>>>> PD/PT bos to
>>>>>> LRU is fixed. So move them on LRU again.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>>> Tested-by: Mike Lothian <mike at fireburn.co.uk>
>>>>>> Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
>>>>>> Acked-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> index db1f28a..d195a3d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct
>>>>>> amdgpu_device *adev,
>>>>>>                           struct amdgpu_vm_bo_base,
>>>>>>                           vm_status);
>>>>>>            bo_base->moved = false;
>>>>>> -        list_del_init(&bo_base->vm_status);
>>>>>> +        list_move(&bo_base->vm_status, &vm->idle);
>>>>>>              bo = bo_base->bo->parent;
>>>>>>            if (!bo)
>>>>>>
>>>>> Since this change, I'm getting various badness when running piglit
>>>>> using
>>>>> radeonsi on Bonaire, see the attached dmesg excerpt.
>>>>>
>>>>> Reverting just this change on top of current amd-staging-drm-next
>>>>> avoids
>>>>> the problem.
>>>>>
>>>>> Looks like some list manipulation isn't sufficiently protected against
>>>>> concurrent execution?
>>>> KASAN pointed me to one issue:
>>>> https://patchwork.freedesktop.org/patch/246212/
>>>>
>>>> However, this doesn't fully fix the problem.
>>> Ray, any ideas yet for solving this? If not, let's revert this change
>>> for now.
>> I've gone over this multiple times now as well, but can't find anything
>> obvious wrong either.
> After looking at the code, one question: Why does vm->moved need a
> spinlock, but not vm->idle? What is protecting against concurrent access
> to the latter?

The moved state is occupied by both normal and per VM BOs, e.g. BOs with 
different reservation objects.

All other states are only used by per VM BOs or PDs/PTs, so we only put 
the BOs on those when the reservation object of the root BO is locked.

We could probably split the moved state into two separate ones to 
further avoid that lock.

Christian.


More information about the amd-gfx mailing list