[Mesa-dev] [PATCH 4/4] i965: Drop the batch and limp along if execbuf fails.

Kenneth Graunke kenneth at whitecape.org
Sun Sep 24 06:02:06 UTC 2017


The execbuf2 ioctl can fail for several reasons:

- a catastrophic bug in Mesa (we're programming garbage commands)
- repeated GPU hangs, where the kernel has stepped in and banned our
  process (or at least fd) from talking to the GPU anymore
- some sort of transient failures (low memory, GPU resetting a lot?)

We've not been too concerned with handling this case, because we thought
that the first two were the only ways this could happen.  In those cases
(which shouldn't happen anyway) it's probably better to exit and avoid
sabotaging the GPU repeatedly, which potentially could tank the system.

But it seems like we can hit this occasionally in other circumstances.
It appears to happen in certain low memory situations.  It might also
happen if someone else is tanking the GPU a bunch of times.  When the
failures are temporary, it's rude to outright kill the application
(especially if it's the X server or Wayland compositor).

With this patch, we raise a GL_OUT_OF_MEMORY error and move on, ignoring
the failure.  For normal flushing, we'll make a new batch and proceed as
normal - hoping that things will work out better in the future, and that
we miraculously avoid things like mapping failures which could cause us
to crash.  For fencing-triggered flushes, we drop the batch reference so
we don't block on it forever.

I'm not entirely sure how this will work out in practice, but the
existing code is dire, so we may as well give it a try and hope it
works out better for our users.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index d564510d06a..bc8a2283f9c 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -782,6 +782,7 @@ submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
    const struct gen_device_info *devinfo = &brw->screen->devinfo;
    __DRIscreen *dri_screen = brw->screen->driScrnPriv;
    struct intel_batchbuffer *batch = &brw->batch;
+   struct gl_context *ctx = &brw->ctx;
    int ret = 0;
 
    if (batch->batch_cpu_map) {
@@ -865,9 +866,8 @@ submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
       brw_check_for_reset(brw);
 
    if (ret != 0) {
-      fprintf(stderr, "i965: Failed to submit batchbuffer: %s\n",
-              strerror(-ret));
-      exit(1);
+      _mesa_error(ctx, GL_OUT_OF_MEMORY,
+                  "i965: Failed to submit batchbuffer: %s\n", strerror(-ret));
    }
 
    return ret;
-- 
2.14.1



More information about the mesa-dev mailing list