[Mesa-dev] [PATCH 3/3] i965: Take responsibility for context recovery after any GPU hang

Chris Wilson chris at chris-wilson.co.uk
Mon Feb 18 12:22:11 UTC 2019


To make wedging even more likely, we use a new "no recovery" context
parameter that tells the kernel to not even attempt to replay any
batches in flight against the default context image, as experience shows
the HW is not always robust enough to cope with the conflicting state.
This allows us to always take over responsibility of rebuilding the lost
context following a GPU hang.

Cc: Kenneth Graunke <kenneth at whitecape.org>
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 1248f8b9fa4..a0cbc315b60 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -1589,6 +1589,28 @@ init_cache_buckets(struct brw_bufmgr *bufmgr)
    }
 }
 
+static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
+{
+   /*
+    * Upon declaring a GPU hang, the kernel will zap the guilty context
+    * back to the default logical HW state and attempt to continue on to
+    * our next submitted batchbuffer. However, we only send incremental
+    * logical state (e.g. we only ever setup invariant register state
+    * once in brw_initial_gpu_upload()) and so attempting to reply the
+    * next batchbuffer without the correct logical state can be fatal.
+    * Here we tell the kernel not to attempt to recover our context but
+    * immediately (on the next batchbuffer submission) report that the
+    * context is lost, and we will do the recovery ourselves -- 2 lost
+    * batches instead of a continual stream until we are banned, or the
+    * machine is dead.
+    */
+   struct drm_i915_gem_context_param p = {
+      .ctx_id = ctx_id,
+      .param = I915_CONTEXT_PARAM_RECOVERABLE,
+   };
+   drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p);
+}
+
 uint32_t
 brw_create_hw_context(struct brw_bufmgr *bufmgr)
 {
@@ -1599,6 +1621,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr)
       return 0;
    }
 
+   init_context(bufmgr, create.ctx_id);
+
    return create.ctx_id;
 }
 
-- 
2.20.1



More information about the mesa-dev mailing list