[PATCH] drm/amdgpu: Revert "drm/amdgpu: stop allocating dummy GTT nodes"

Christian König christian.koenig at amd.com
Wed Sep 2 08:02:40 UTC 2020


Forget it, I see the problem now as well. Give me a second to provide a 
better fix.

Thanks for the hint,
Christian.

Am 02.09.20 um 09:10 schrieb Christian König:
> We got a bug report from upstream about this as well, but I couldn't 
> reproduce it so far.
>
> Why would we touch outside of the page table with this?
>
> Regards,
> Christian.
>
> Am 02.09.20 um 05:43 schrieb xinhui pan:
>> This reverts commit 1e691e2444871d1fde11b611653b5da9010dcec8.
>>
>> mem->mm_node now could be NULL with commit above. That makes
>> amdgpu_vm_bo_split_mapping touchs outside of the page table as
>> max_entries set to S64_MAX;
>>
>> before we fix that issue, revert commit above.
>>
>> [  978.955925] BUG: unable to handle page fault for address: 
>> ffff94dfc4bc0000
>> [  978.963424] #PF: supervisor read access in kernel mode
>> [  978.969034] #PF: error_code(0x0000) - not-present page
>> [  978.974662] PGD 72e201067 P4D 72e201067 PUD 86a414067 PMD 
>> 86a3ee067 PTE 800ffff83b43f060
>> [  978.983494] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
>> [  978.988992] CPU: 0 PID: 12264 Comm: Xorg Tainted: G        W 
>> O      5.9.0-rc2+ #46
>> [  978.997394] Hardware name: System manufacturer System Product 
>> Name/PRIME Z390-A, BIOS 1401 11/26/2019
>> [  979.007495] RIP: 0010:amdgpu_vm_bo_update+0x5af/0x880 [amdgpu]
>> [  979.013881] Code: ff ff ff ff ff 7f 48 8b 45 c0 4c 8d 04 d8 b8 01 
>> 00 00 00 eb 09 48 83 c0 01 48 39 c2 76 12 49 8b 74 c0 f8 48 81 c6 00 
>> 10 00 00 <49> 39 34 c0 74 e5 8b 75 b4 4c 8b 45 c8 48 38
>> [  979.034354] RSP: 0018:ffffa94281403ba8 EFLAGS: 00010206
>> [  979.040050] RAX: 0000000000000200 RBX: 0000000000000e00 RCX: 
>> 00000000001049e8
>> [  979.047824] RDX: 7fffffffffffffff RSI: 00000007c5e00000 RDI: 
>> ffff94dfd5fc0000
>> [  979.055644] RBP: ffffa94281403c40 R08: ffff94dfc4bbf000 R09: 
>> 0000000000000001
>> [  979.063441] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> 00000000001047e8
>> [  979.071279] R13: 0000000000000000 R14: 00000000001047e9 R15: 
>> ffff94dfc4e9af48
>> [  979.079098] FS:  00007f19d3d00a80(0000) GS:ffff94e007e00000(0000) 
>> knlGS:0000000000000000
>> [  979.087911] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  979.094240] CR2: ffff94dfc4bc0000 CR3: 00000007c408c005 CR4: 
>> 00000000003706f0
>> [  979.102050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [  979.109868] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [  979.117669] Call Trace:
>> [  979.120393]  amdgpu_gem_va_ioctl+0x533/0x560 [amdgpu]
>> [  979.125970]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
>> [  979.131914]  drm_ioctl_kernel+0xb4/0x100 [drm]
>> [  979.136792]  drm_ioctl+0x241/0x400 [drm]
>> [  979.141100]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
>> [  979.147003]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
>> [  979.152446]  ? trace_hardirqs_on+0x2b/0xf0
>> [  979.156977]  amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
>> [  979.162033]  __x64_sys_ioctl+0x91/0xc0
>> [  979.166117]  do_syscall_64+0x38/0x90
>> [  979.170022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  979.175537] RIP: 0033:0x7f19d405e37b
>> [  979.179450] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 
>> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 
>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 08
>> [  979.200034] RSP: 002b:00007ffe66c9e938 EFLAGS: 00000246 ORIG_RAX: 
>> 0000000000000010
>> [  979.208330] RAX: ffffffffffffffda RBX: 00007ffe66c9e990 RCX: 
>> 00007f19d405e37b
>> [  979.216147] RDX: 00007ffe66c9e990 RSI: 00000000c0286448 RDI: 
>> 0000000000000010
>> [  979.223897] RBP: 00000000c0286448 R08: 00000001039e9000 R09: 
>> 000000000000000e
>> [  979.231742] R10: 00005640dcedf010 R11: 0000000000000246 R12: 
>> 0000000000000000
>> [  979.239555] R13: 0000000000000010 R14: 0000000000000001 R15: 
>> 00007ffe66c9ea58
>> [  979.247358] Modules linked in: amdgpu(O) iommu_v2 gpu_sched(O) 
>> ttm(O) drm_kms_helper(O) cec i2c_algo_bit fb_sys_fops syscopyarea 
>> sysfillrect sysimgblt overlay binfmt_misc snd_sof_pci snd_sos
>> [  979.247375]  x_tables autofs4 crc32_pclmul e1000e i2c_i801 
>> i2c_smbus ahci libahci wmi video pinctrl_cannonlake pinctrl_intel
>> [  979.354934] CR2: ffff94dfc4bc0000
>> [  979.358566] ---[ end trace 5b622843e4242519 ]---
>>
>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 104 ++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  18 +---
>>   2 files changed, 80 insertions(+), 42 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>> index e1b66898cb76..295d6fbcda8f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>> @@ -150,7 +150,60 @@ static int amdgpu_gtt_mgr_fini(struct 
>> ttm_mem_type_manager *man)
>>    */
>>   bool amdgpu_gtt_mgr_has_gart_addr(struct ttm_mem_reg *mem)
>>   {
>> -    return mem->mm_node != NULL;
>> +    struct amdgpu_gtt_node *node = mem->mm_node;
>> +
>> +    return (node->node.start != AMDGPU_BO_INVALID_OFFSET);
>> +}
>> +
>> +/**
>> + * amdgpu_gtt_mgr_alloc - allocate new ranges
>> + *
>> + * @man: TTM memory type manager
>> + * @tbo: TTM BO we need this range for
>> + * @place: placement flags and restrictions
>> + * @mem: the resulting mem object
>> + *
>> + * Allocate the address space for a node.
>> + */
>> +static int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
>> +                struct ttm_buffer_object *tbo,
>> +                const struct ttm_place *place,
>> +                struct ttm_mem_reg *mem)
>> +{
>> +    struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>> +    struct amdgpu_gtt_mgr *mgr = man->priv;
>> +    struct amdgpu_gtt_node *node = mem->mm_node;
>> +    enum drm_mm_insert_mode mode;
>> +    unsigned long fpfn, lpfn;
>> +    int r;
>> +
>> +    if (amdgpu_gtt_mgr_has_gart_addr(mem))
>> +        return 0;
>> +
>> +    if (place)
>> +        fpfn = place->fpfn;
>> +    else
>> +        fpfn = 0;
>> +
>> +    if (place && place->lpfn)
>> +        lpfn = place->lpfn;
>> +    else
>> +        lpfn = adev->gart.num_cpu_pages;
>> +
>> +    mode = DRM_MM_INSERT_BEST;
>> +    if (place && place->flags & TTM_PL_FLAG_TOPDOWN)
>> +        mode = DRM_MM_INSERT_HIGH;
>> +
>> +    spin_lock(&mgr->lock);
>> +    r = drm_mm_insert_node_in_range(&mgr->mm, &node->node, 
>> mem->num_pages,
>> +                    mem->page_alignment, 0, fpfn, lpfn,
>> +                    mode);
>> +    spin_unlock(&mgr->lock);
>> +
>> +    if (!r)
>> +        mem->start = node->node.start;
>> +
>> +    return r;
>>   }
>>     /**
>> @@ -181,37 +234,29 @@ static int amdgpu_gtt_mgr_new(struct 
>> ttm_mem_type_manager *man,
>>       atomic64_sub(mem->num_pages, &mgr->available);
>>       spin_unlock(&mgr->lock);
>>   -    if (!place->lpfn) {
>> -        mem->mm_node = NULL;
>> -        mem->start = AMDGPU_BO_INVALID_OFFSET;
>> -        return 0;
>> -    }
>> -
>>       node = kzalloc(sizeof(*node), GFP_KERNEL);
>>       if (!node) {
>>           r = -ENOMEM;
>>           goto err_out;
>>       }
>>   +    node->node.start = AMDGPU_BO_INVALID_OFFSET;
>> +    node->node.size = mem->num_pages;
>>       node->tbo = tbo;
>> -
>> -    spin_lock(&mgr->lock);
>> -    r = drm_mm_insert_node_in_range(&mgr->mm, &node->node, 
>> mem->num_pages,
>> -                    mem->page_alignment, 0, place->fpfn,
>> -                    place->lpfn, DRM_MM_INSERT_BEST);
>> -    spin_unlock(&mgr->lock);
>> -
>> -    if (unlikely(r))
>> -        goto err_free;
>> -
>>       mem->mm_node = node;
>> -    mem->start = node->node.start;
>>   -    return 0;
>> -
>> -err_free:
>> -    kfree(node);
>> +    if (place->fpfn || place->lpfn || place->flags & 
>> TTM_PL_FLAG_TOPDOWN) {
>> +        r = amdgpu_gtt_mgr_alloc(man, tbo, place, mem);
>> +        if (unlikely(r)) {
>> +            kfree(node);
>> +            mem->mm_node = NULL;
>> +            goto err_out;
>> +        }
>> +    } else {
>> +        mem->start = node->node.start;
>> +    }
>>   +    return 0;
>>   err_out:
>>       atomic64_add(mem->num_pages, &mgr->available);
>>   @@ -232,14 +277,17 @@ static void amdgpu_gtt_mgr_del(struct 
>> ttm_mem_type_manager *man,
>>       struct amdgpu_gtt_mgr *mgr = man->priv;
>>       struct amdgpu_gtt_node *node = mem->mm_node;
>>   -    if (node) {
>> -        spin_lock(&mgr->lock);
>> -        drm_mm_remove_node(&node->node);
>> -        spin_unlock(&mgr->lock);
>> -        kfree(node);
>> -    }
>> +    if (!node)
>> +        return;
>>   +    spin_lock(&mgr->lock);
>> +    if (node->node.start != AMDGPU_BO_INVALID_OFFSET)
>> +        drm_mm_remove_node(&node->node);
>> +    spin_unlock(&mgr->lock);
>>       atomic64_add(mem->num_pages, &mgr->available);
>> +
>> +    kfree(node);
>> +    mem->mm_node = NULL;
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 63e541409549..cecda9d0d067 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -430,22 +430,12 @@ int amdgpu_ttm_copy_mem_to_mem(struct 
>> amdgpu_device *adev,
>>       }
>>         src_offset = src->offset;
>> -    if (src->mem->mm_node) {
>> -        src_mm = amdgpu_find_mm_node(src->mem, &src_offset);
>> -        src_node_size = (src_mm->size << PAGE_SHIFT) - src_offset;
>> -    } else {
>> -        src_mm = NULL;
>> -        src_node_size = ULLONG_MAX;
>> -    }
>> +    src_mm = amdgpu_find_mm_node(src->mem, &src_offset);
>> +    src_node_size = (src_mm->size << PAGE_SHIFT) - src_offset;
>>         dst_offset = dst->offset;
>> -    if (dst->mem->mm_node) {
>> -        dst_mm = amdgpu_find_mm_node(dst->mem, &dst_offset);
>> -        dst_node_size = (dst_mm->size << PAGE_SHIFT) - dst_offset;
>> -    } else {
>> -        dst_mm = NULL;
>> -        dst_node_size = ULLONG_MAX;
>> -    }
>> +    dst_mm = amdgpu_find_mm_node(dst->mem, &dst_offset);
>> +    dst_node_size = (dst_mm->size << PAGE_SHIFT) - dst_offset;
>>         mutex_lock(&adev->mman.gtt_window_lock);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list