[Intel-gfx] [PATCH] drm/i915/ttm: never purge busy objects

Niranjana Vishwanathapura niranjana.vishwanathapura at intel.com
Fri Nov 11 18:11:06 UTC 2022


On Mon, Nov 07, 2022 at 01:30:27PM +0000, Matthew Auld wrote:
>In i915_gem_madvise_ioctl() we immediately purge the object is not
>currently used, like when the mm.pages are NULL.  With shmem the pages
>might still be hanging around or are perhaps swapped out. Similarly with
>ttm we might still have the pages hanging around on the ttm resource,
>like with lmem or shmem, but here we need to be extra careful since
>async unbinds are possible as well as in-progress kernel moves. In
>i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
>for us, however if it's busy the memory is only moved to a ghost object,
>which then leads to broken behaviour when for example clearing the
>i915_tt->filp, since the actual ttm_tt is still alive and populated,
>even though it's been moved to the ghost object.  When we later destroy
>the ghost object we hit the following, since the filp is now NULL:
>
>[  +0.006982] #PF: supervisor read access in kernel mode
>[  +0.005149] #PF: error_code(0x0000) - not-present page
>[  +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
>[  +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
>[  +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
>[  +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
>[  +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
>8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
> 30 e9 42 b2 ff ff 4c 89
>[  +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
>[  +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
>[  +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
>[  +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
>[  +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
>[  +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
>[  +0.007154] FS:  0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
>[  +0.008108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[  +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
>[  +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[  +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[  +0.007154] Call Trace:
>[  +0.002459]  <TASK>
>[  +0.002126]  ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
>[  +0.005068]  ttm_bo_tt_destroy+0x1c/0x50 [ttm]
>[  +0.004464]  ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
>[  +0.005244]  ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
>[  +0.004721]  ttm_bo_delayed_delete+0x235/0x250 [ttm]
>[  +0.004981]  ttm_device_delayed_workqueue+0x13/0x40 [ttm]
>[  +0.005422]  process_one_work+0x248/0x560
>[  +0.004028]  worker_thread+0x4b/0x390
>[  +0.003682]  ? process_one_work+0x560/0x560
>[  +0.004199]  kthread+0xeb/0x120
>[  +0.003163]  ? kthread_complete_and_exit+0x20/0x20
>[  +0.004815]  ret_from_fork+0x1f/0x30
>
>Fixes: 213d50927763 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
>Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
>Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>Cc: Andrzej Hajda <andrzej.hajda at intel.com>
>Cc: Nirmoy Das <nirmoy.das at intel.com>
>Cc: <stable at vger.kernel.org> # v5.15+
>---
> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>index 25129af70f70..fb452a388b5e 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>@@ -599,10 +599,18 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
> static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
> {
> 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>+	struct ttm_operation_ctx ctx = {
>+		.interruptible = true,
>+		.no_wait_gpu = false,
>+	};
> 	int err;
>
> 	WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
>
>+	err = ttm_bo_wait_ctx(bo, &ctx);
>+	if (err)
>+		return err;
>+

We probably can call ttm_bo_wait() directly without the ctx?

Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>

> 	err = i915_ttm_move_notify(bo);
> 	if (err)
> 		return err;
>-- 
>2.38.1
>


More information about the Intel-gfx mailing list