ttm crash on init

Christian König ckoenig.leichtzumerken at gmail.com
Mon Sep 17 18:08:25 UTC 2018


Am 17.09.2018 um 20:01 schrieb Tom St Denis:
> On 2018-09-17 1:55 p.m., Christian König wrote:
>> Am 17.09.2018 um 19:50 schrieb Tom St Denis:
>>> On 2018-09-17 1:45 p.m., Christian König wrote:
>>>> Mhm, not the slightest idea.
>>>>
>>>> That nearly looks like adev->stolen_vga_memory already contains 
>>>> something.
>>>
>>> Nope,
>>>
>>> [   51.564605] >>>adev->stolen_vga_memory == (null)
>>> [   51.564619] kasan: CONFIG_KASAN_INLINE enabled
>>> [   51.564877] kasan: GPF could be caused by NULL-ptr deref or user 
>>> memory access
>>> [   51.565071] general protection fault: 0000 [#1] SMP 
>>> DEBUG_PAGEALLOC KASAN NOPTI
>>> [   51.565254] CPU: 6 PID: 3863 Comm: modprobe Not tainted 
>>> 4.19.0-rc1+ #30
>>> [   51.565425] Hardware name: System manufacturer System Product 
>>> Name/TUF B350M-PLUS GAMING, BIOS 4011 04/19/2018
>>> [   51.565714] RIP: 0010:amdgpu_bo_create_kernel+0x59/0x1a0 [amdgpu]
>>>
>>> That's me printing out the value of the value for stolen_vga_memory 
>>> before the call to allocate it.
>>
>> What does amdgpu_bo_create_kernel+0x59 points to?
>
> I've never really got line numbers to work with the kernel but if I 
> had to guess I'd say right here
>
> int amdgpu_bo_create_kernel(struct amdgpu_device *adev,
>                 unsigned long size, int align,
>                 u32 domain, struct amdgpu_bo **bo_ptr,
>                 u64 *gpu_addr, void **cpu_addr)
> {
>     int r;
>
>     r = amdgpu_bo_create_reserved(adev, size, align, domain, bo_ptr,
>                       gpu_addr, cpu_addr);
>
>     if (r)
>         return r;
>
> *bo_ptr is NULL ===>    amdgpu_bo_unreserve(*bo_ptr);

Ah, of course! Thanks for pointing out the obvious, totally forgot that 
there is still another function in the call chain.

Patch to fix is on the list,
Christian.

>
>     return 0;
> }
>
> Which then results in
>
> static inline void amdgpu_bo_unreserve(struct amdgpu_bo *bo)
> {
>     ttm_bo_unreserve(&bo->tbo);
> }
>
> Which then passes the address NULL + offsetof(tbo) to ttm_bo_unreserve:
>
> static inline void ttm_bo_unreserve(struct ttm_buffer_object *bo)
> {
>         if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>                 spin_lock(&bo->bdev->glob->lru_lock);
>                 ttm_bo_add_to_lru(bo);
> spin_unlock(&bo->bdev->glob->lru_lock);
>         }
>         reservation_object_unlock(bo->resv);
> }
>
>
> Which likely faults on reading bo->mem.placement since the address is 
> bogus.
>
> The report is from amdgpu_bo_create_kernel because everything is a 
> macro or inlined... :-)
>
> Tom
>
>>
>> Christian.
>>
>>>
>>> Tom
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>> Am 17.09.2018 um 18:47 schrieb Tom St Denis:
>>>>> On 2018-09-17 12:21 p.m., Tom St Denis wrote:
>>>>>> (attached).  I'll try to bisect in a second.  Is anyone aware of 
>>>>>> this?
>>>>>>
>>>>>> Tom
>>>>>
>>>>> Bisection led to:
>>>>>
>>>>> a327772a5655ff4fb104c8aae6515faa461df466 is the first bad commit
>>>>> commit a327772a5655ff4fb104c8aae6515faa461df466
>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>> Date:   Fri Sep 14 21:06:50 2018 +0200
>>>>>
>>>>>     drm/amdgpu: drop size check
>>>>>
>>>>>     We no don't allocate zero sized kernel BOs any longer.
>>>>>
>>>>>     Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>     Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>
>>>>> :040000 040000 265e4fa231d367d354e4c66600b8f98a4d2f04c4 
>>>>> 3702baaeb2423361dcd7eac8c533edace760ae3e M      drivers
>>>>>
>>>>>
>>>>> As the culprit.
>>>>>
>>>>> Cheers,
>>>>> Tom
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list