[PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

Michel Dänzer michel at daenzer.net
Wed Aug 29 14:51:32 UTC 2018


On 2018-08-29 10:57 a.m., Christian König wrote:
> Am 29.08.2018 um 09:52 schrieb Michel Dänzer:
>> On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
>>> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
>>>> On 2018-08-22 9:52 a.m., Huang Rui wrote:
>>>>> The new bulk moving functionality is ready, the overhead of moving
>>>>> PD/PT bos to
>>>>> LRU is fixed. So move them on LRU again.
>>>>>
>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>> Tested-by: Mike Lothian <mike at fireburn.co.uk>
>>>>> Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
>>>>> Acked-by: Chunming Zhou <david1.zhou at amd.com>
>>>>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> index db1f28a..d195a3d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct
>>>>> amdgpu_device *adev,
>>>>>                          struct amdgpu_vm_bo_base,
>>>>>                          vm_status);
>>>>>           bo_base->moved = false;
>>>>> -        list_del_init(&bo_base->vm_status);
>>>>> +        list_move(&bo_base->vm_status, &vm->idle);
>>>>>             bo = bo_base->bo->parent;
>>>>>           if (!bo)
>>>>>
>>>> Since this change, I'm getting various badness when running piglit
>>>> using
>>>> radeonsi on Bonaire, see the attached dmesg excerpt.
>>>>
>>>> Reverting just this change on top of current amd-staging-drm-next
>>>> avoids
>>>> the problem.
>>>>
>>>> Looks like some list manipulation isn't sufficiently protected against
>>>> concurrent execution?
>>> KASAN pointed me to one issue:
>>> https://patchwork.freedesktop.org/patch/246212/
>>>
>>> However, this doesn't fully fix the problem.
>> Ray, any ideas yet for solving this? If not, let's revert this change
>> for now.
> 
> I've gone over this multiple times now as well, but can't find anything
> obvious wrong either.

After looking at the code, one question: Why does vm->moved need a
spinlock, but not vm->idle? What is protecting against concurrent access
to the latter?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the amd-gfx mailing list