[PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

Liang, Prike Prike.Liang at amd.com
Thu May 23 08:27:58 UTC 2019


Hi, Christian 

Thanks for you patch .Those patches can fix amdgpu bo pinned failed issue during perform dm_plane_helper_prepare_fb 
and Abaqus performance seems improved.

But there some error message can be observer. Do we need drop  amdgpu_vm_validate_pt_bos() error message 
and other warning debug info .

[ 1910.674541] Call Trace:
[ 1910.676944]  [<ffffffff8b361dc1>] dump_stack+0x19/0x1b
[ 1910.682236]  [<ffffffff8ac97648>] __warn+0xd8/0x100
[ 1910.687195]  [<ffffffff8ac9778d>] warn_slowpath_null+0x1d/0x20
[ 1910.693167]  [<ffffffffc0603619>] amdgpu_bo_move+0x169/0x1c0 [amdgpu]
[ 1910.699719]  [<ffffffffc05c82bb>] ttm_bo_handle_move_mem+0x26b/0x5d0 [amdttm]
[ 1910.706976]  [<ffffffffc05c8767>] ttm_bo_evict+0x147/0x3b0 [amdttm]
[ 1910.713358]  [<ffffffffc04e88d9>] ? drm_mm_insert_node_in_range+0x299/0x4d0 [drm]
[ 1910.720881]  [<ffffffffc057652e>] ? _kcl_reservation_object_reserve_shared+0xfe/0x1a0 [amdkcl]
[ 1910.729710]  [<ffffffffc05c8c6e>] ttm_mem_evict_first+0x29e/0x3a0 [amdttm]
[ 1910.736705]  [<ffffffffc05c8f1e>] amdttm_bo_mem_space+0x1ae/0x300 [amdttm]
[ 1910.743696]  [<ffffffffc05c9544>] amdttm_bo_validate+0xc4/0x140 [amdttm]
[ 1910.750529]  [<ffffffffc060c035>] amdgpu_cs_bo_validate+0xa5/0x220 [amdgpu]
[ 1910.757625]  [<ffffffffc060c1f7>] amdgpu_cs_validate+0x47/0x2e0 [amdgpu]
[ 1910.764463]  [<ffffffffc060c1b0>] ? amdgpu_cs_bo_validate+0x220/0x220 [amdgpu]
[ 1910.771736]  [<ffffffffc0620652>] amdgpu_vm_validate_pt_bos+0x92/0x140 [amdgpu]
[ 1910.779248]  [<ffffffffc060e547>] amdgpu_cs_ioctl+0x18a7/0x1d50 [amdgpu]
[ 1910.785992]  [<ffffffffc060cca0>] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[ 1910.793486]  [<ffffffffc04e3f2c>] drm_ioctl_kernel+0x6c/0xb0 [drm]
[ 1910.799777]  [<ffffffffc04e4647>] drm_ioctl+0x1e7/0x420 [drm]
[ 1910.805643]  [<ffffffffc060cca0>] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[ 1910.813090]  [<ffffffffc05ec04b>] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
[ 1910.819639]  [<ffffffff8ae56210>] do_vfs_ioctl+0x3a0/0x5a0
[ 1910.825217]  [<ffffffff8b36744a>] ? __schedule+0x13a/0x890
[ 1910.830795]  [<ffffffff8ae564b1>] SyS_ioctl+0xa1/0xc0
[ 1910.835943]  [<ffffffff8b374ddb>] system_call_fastpath+0x22/0x27
[ 1910.842048] ---[ end trace a5c00b151c061d53 ]---
[ 1910.846814] [TTM] Buffer eviction failed
[ 1910.850838] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
[ 1910.858905] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
.......

Thanks,
Prike
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com> 
Sent: Wednesday, May 22, 2019 9:00 PM
To: Olsak, Marek <Marek.Olsak at amd.com>; Zhou, David(ChunMing) <David1.Zhou at amd.com>; Liang, Prike <Prike.Liang at amd.com>; dri-devel at lists.freedesktop.org; amd-gfx at lists.freedesktop.org
Subject: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

[CAUTION: External Email]

This avoids OOM situations when we have lots of threads submitting at the same time.

v3: apply this to the whole driver, not just CS

Signed-off-by: Christian König <christian.koenig at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c    | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 20f2955d2a55..3e2da24cd17a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
        }

        r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
-                                  &duplicates, true);
+                                  &duplicates, false);
        if (unlikely(r != 0)) {
                if (r != -ERESTARTSYS)
                        DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 06f83cac0d3a..f660628e6af9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
        list_add(&csa_tv.head, &list);
        amdgpu_vm_get_pd_bo(vm, &list, &pd);

-       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, true);
+       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, false);
        if (r) {
                DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
                return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d513a5ad03dd..ed25a4e14404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,

        amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);

-       r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates, true);
+       r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates, 
+ false);
        if (r) {
                dev_err(adev->dev, "leaking bo va because "
                        "we fail to reserve bo (%d)\n", r); @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,

        amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);

-       r = ttm_eu_reserve_buffers(&ticket, &list, true, &duplicates, true);
+       r = ttm_eu_reserve_buffers(&ticket, &list, true, &duplicates, 
+ false);
        if (r)
                goto error_unref;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index c430e8259038..d60593cc436e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr)
        struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
        int r;

-       r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
+       r = __ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
        if (unlikely(r != 0)) {
                if (r != -ERESTARTSYS)
                        dev_err(adev->dev, "%p reserve failed\n", bo);
--
2.17.1



More information about the amd-gfx mailing list