[PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

Christian König ckoenig.leichtzumerken at gmail.com
Wed Aug 29 08:57:52 UTC 2018


Am 29.08.2018 um 09:52 schrieb Michel Dänzer:
> On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
>> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
>>> On 2018-08-22 9:52 a.m., Huang Rui wrote:
>>>> The new bulk moving functionality is ready, the overhead of moving PD/PT bos to
>>>> LRU is fixed. So move them on LRU again.
>>>>
>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>> Tested-by: Mike Lothian <mike at fireburn.co.uk>
>>>> Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
>>>> Acked-by: Chunming Zhou <david1.zhou at amd.com>
>>>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index db1f28a..d195a3d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
>>>>   					   struct amdgpu_vm_bo_base,
>>>>   					   vm_status);
>>>>   		bo_base->moved = false;
>>>> -		list_del_init(&bo_base->vm_status);
>>>> +		list_move(&bo_base->vm_status, &vm->idle);
>>>>   
>>>>   		bo = bo_base->bo->parent;
>>>>   		if (!bo)
>>>>
>>> Since this change, I'm getting various badness when running piglit using
>>> radeonsi on Bonaire, see the attached dmesg excerpt.
>>>
>>> Reverting just this change on top of current amd-staging-drm-next avoids
>>> the problem.
>>>
>>> Looks like some list manipulation isn't sufficiently protected against
>>> concurrent execution?
>> KASAN pointed me to one issue:
>> https://patchwork.freedesktop.org/patch/246212/
>>
>> However, this doesn't fully fix the problem.
> Ray, any ideas yet for solving this? If not, let's revert this change
> for now.

I've gone over this multiple times now as well, but can't find anything 
obvious wrong either.

If we don't have any more ideas I would say revert it for now and try to 
debug it further.

BTW: Any idea how to force the issue?

Christian.


More information about the amd-gfx mailing list