[Intel-gfx] [RFC PATCH 160/162] drm/i915/dg1: Fix GPU hang due to shmemfs page drop

Matthew Auld matthew.auld at intel.com
Fri Nov 27 12:07:16 UTC 2020


From: Venkata Ramana Nayana <venkata.ramana.nayana at intel.com>

This is to fix a bug in upstream
commit a6326a4f8ffb ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")

We allocate context state obj ce->state from lmem, so in __engines_record_defaults(),
we call shmem_create_from_object(). Because it is lmem object, this call will
create a new shmemfs file, copy the contents into it, and return the file
pointer and assign to engine->default_state. Of course ce->state lmem object
is freed at the end of function __engines_record_redefaults().

Because a new shmemfs file is create for engine->default_state,
and more importantly, we DON'T mark the pages dirty after we write into it,
the OS page cache eviction will drop these pages.

Now with the test move forward, it will create new request/context, and will
copy the saved engine->default_state into ce->state. If the default_state
pages are dropped during page cache eviction, the copying will get new pages,
and copy garbage from the new pages. Next, ce->state will have wrong
instruction and causes GPU to hang.

The fixing is very simple, we just mark the shmemfs pages to be dirty when
writing into it, and also mark the pages to accessed when read/write to them.

Fixes: a6326a4f8ffb("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
Cc: Sudeep Dutt <sudeep.dutt at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Cc: Ramalingam C <ramalingam.c at intel.com>
Cc: Chris Wilson <chris at intel.com>
Signed-off-by: CQ Tang <cq.tang at intel.com>
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana at intel.com>
---
 drivers/gpu/drm/i915/gt/shmem_utils.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 1fbc070a4651..e24c2c2342bb 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -105,10 +105,13 @@ static int __shmem_rw(struct file *file, loff_t off,
 			return PTR_ERR(page);
 
 		vaddr = kmap(page);
-		if (write)
+		if (write) {
 			memcpy(vaddr + offset_in_page(off), ptr, this);
-		else
+			set_page_dirty(page);
+		} else {
 			memcpy(ptr, vaddr + offset_in_page(off), this);
+		}
+		mark_page_accessed(page);
 		kunmap(page);
 		put_page(page);
 
-- 
2.26.2



More information about the Intel-gfx mailing list