[Mesa-dev] [PATCH 4/4] i965: Drop the batch and limp along if execbuf fails.

Jason Ekstrand jason at jlekstrand.net
Sun Sep 24 21:53:04 UTC 2017


I've got this a few times recently and it's really annoying.  I don't know 
if this will fix anything or not but it may be worth a go.  I fear, 
however, that ignoring an execbuf failure will lead to permanently 
corrupted rendering or even additional hangs due to a chunk of the stream 
being missing.  That seems undesirable.  I would feel more comfortable 
about if you flagged BRW_NEW_CONTEXT in this cases to force a full state 
re-emit.


On September 24, 2017 1:02:18 AM Kenneth Graunke <kenneth at whitecape.org> wrote:

> The execbuf2 ioctl can fail for several reasons:
>
> - a catastrophic bug in Mesa (we're programming garbage commands)
> - repeated GPU hangs, where the kernel has stepped in and banned our
>   process (or at least fd) from talking to the GPU anymore
> - some sort of transient failures (low memory, GPU resetting a lot?)
>
> We've not been too concerned with handling this case, because we thought
> that the first two were the only ways this could happen.  In those cases
> (which shouldn't happen anyway) it's probably better to exit and avoid
> sabotaging the GPU repeatedly, which potentially could tank the system.
>
> But it seems like we can hit this occasionally in other circumstances.
> It appears to happen in certain low memory situations.  It might also
> happen if someone else is tanking the GPU a bunch of times.  When the
> failures are temporary, it's rude to outright kill the application
> (especially if it's the X server or Wayland compositor).
>
> With this patch, we raise a GL_OUT_OF_MEMORY error and move on, ignoring
> the failure.  For normal flushing, we'll make a new batch and proceed as
> normal - hoping that things will work out better in the future, and that
> we miraculously avoid things like mapping failures which could cause us
> to crash.  For fencing-triggered flushes, we drop the batch reference so
> we don't block on it forever.
>
> I'm not entirely sure how this will work out in practice, but the
> existing code is dire, so we may as well give it a try and hope it
> works out better for our users.
> ---
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index d564510d06a..bc8a2283f9c 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -782,6 +782,7 @@ submit_batch(struct brw_context *brw, int in_fence_fd, 
> int *out_fence_fd)
>     const struct gen_device_info *devinfo = &brw->screen->devinfo;
>     __DRIscreen *dri_screen = brw->screen->driScrnPriv;
>     struct intel_batchbuffer *batch = &brw->batch;
> +   struct gl_context *ctx = &brw->ctx;
>     int ret = 0;
>
>     if (batch->batch_cpu_map) {
> @@ -865,9 +866,8 @@ submit_batch(struct brw_context *brw, int in_fence_fd, 
> int *out_fence_fd)
>        brw_check_for_reset(brw);
>
>     if (ret != 0) {
> -      fprintf(stderr, "i965: Failed to submit batchbuffer: %s\n",
> -              strerror(-ret));
> -      exit(1);
> +      _mesa_error(ctx, GL_OUT_OF_MEMORY,
> +                  "i965: Failed to submit batchbuffer: %s\n", strerror(-ret));
>     }
>
>     return ret;
> --
> 2.14.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




More information about the mesa-dev mailing list