[PATCH] drm/amdgpu: set bulk_moveable to false when a per VM is released

Christian König christian.koenig at amd.com
Mon Sep 10 06:54:24 UTC 2018


Am 10.09.2018 um 08:19 schrieb Huang Rui:
> On Sun, Sep 09, 2018 at 06:38:13PM +0800, StDenis, Tom wrote:
>> On 2018-09-08 5:12 a.m., Huang Rui wrote:
>>> On Wed, Sep 05, 2018 at 05:08:26PM +0200, Christian König wrote:
>>>> Otherwise we might run into a use after free during bulk move.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> Is this patch able to fix the KASAN?
>>> [   66.143009] ==================================================================
>>> [   66.143254] BUG: KASAN: use-after-free in ttm_bo_bulk_move_lru_tail+0x2b/0x100 [ttm]
>>> [   66.143263] Read of size 8 at addr ffff8801f193d550 by task gnome-shel:cs0/4194
>>>
>>> Tom, may we have your tested-by?
>>>
>>> Reviewed-by: Huang Rui <ray.huang at amd.com>
>> Hi Ray,
>>
>> I had tested this patch and it failed to survive a piglit run.  The only
>> fix so far was to completely disable bulk moves with this:
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index ea5e277ae038..ab244a726ad9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -397,7 +397,7 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device
>> *adev,
>>           }
>>           spin_unlock(&glob->lru_lock);
>>
>> -       vm->bulk_moveable = true;
>> +//     vm->bulk_moveable = true;
>>    }
>>
>>    /**
>>
> Thanks, Tom.
> I enabled KASAN with compiler instrumentation type as outline, but module
> is unable to load with the protection fault. Did I have something missed?

The full kernel needs to be compiled with KASAN enabled (or disabled) or 
otherwise that won't work correctly.

Just compiling and installing the amdgpu module is usually not enough.

Christian.

>
> [   85.348249] calling  drm_core_init+0x0/0xde [drm] @ 1391
> [   85.353763] initcall drm_core_init+0x0/0xde [drm] returned 0 after 78 usecs
> [   85.376264] calling  ttm_init+0x0/0x1000 [ttm] @ 1391
> [   85.381488] initcall ttm_init+0x0/0x1000 [ttm] returned 0 after 92 usecs
> [   85.407897] general protection fault: 0000 [#1] SMP KASAN PTI
> [   85.413751] CPU: 0 PID: 1391 Comm: modprobe Not tainted 4.19.0-rc1-custom #1
> [   85.420900] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
> [   85.430374] RIP: 0010:memset_erms+0x9/0x10
> [   85.434559] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> [   85.453641] RSP: 0018:ffff8803dea27cf8 EFLAGS: 00010202
> [   85.458955] RAX: 1ffffffff8174800 RBX: ffffffffc0ba4040 RCX: 1ffffffff8174808
> [   85.466201] RDX: 1ffffffff8174808 RSI: 0000000000000000 RDI: dffffc0000000000
> [   85.473462] RBP: 0000000000000000 R08: ffff8803cf752f88 R09: dffffc0000000000
> [   85.480751] R10: 0000000000000007 R11: 00000000ef150e75 R12: ffffffffc0bb6000
> [   85.488038] R13: 0000000000000002 R14: ffffffffc0ba4040 R15: ffffffffc0bb9a00
> [   85.495319] FS:  00007f50d35c9700(0000) GS:ffff8803ee800000(0000) knlGS:0000000000000000
> [   85.503535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   85.509386] CR2: 00007fffa12bc6f8 CR3: 00000003e15c6004 CR4: 00000000003606f0
> [   85.516630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   85.523893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   85.531183] Call Trace:
> [   85.533672]  kasan_unpoison_shadow+0xf/0x30
>
> Thanks,
> Ray
>
>> Tom
>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index ea5e277ae038..ed1e6abda391 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -2513,8 +2513,12 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
>>>>    		      struct amdgpu_bo_va *bo_va)
>>>>    {
>>>>    	struct amdgpu_bo_va_mapping *mapping, *next;
>>>> +	struct amdgpu_bo *bo = bo_va->base.bo;
>>>>    	struct amdgpu_vm *vm = bo_va->base.vm;
>>>>    
>>>> +	if (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)
>>>> +		vm->bulk_moveable = false;
>>>> +
>>>>    	list_del(&bo_va->base.bo_list);
>>>>    
>>>>    	spin_lock(&vm->invalidated_lock);
>>>> -- 
>>>> 2.17.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list