[Mesa-dev] [PATCH 3/4] i965: Prevent infinite finish_batch recursion

Fri Feb 27 10:26:17 PST 2015

On Fri, Feb 27, 2015 at 10:22:10AM -0800, Ben Widawsky wrote:
> From the comments in the code:
> 
>    Called from intel_batchbuffer_flush before emitting MI_BATCHBUFFER_END and
>    sending it off.
> 
> This fixes a possible, unlikely infinite recursion in our batch flush path. More
> importantly it allows me to add some code here.
> 
> The relevant part of the call chain for flush
> intel_batchbuffer_flush()->brw_finish_batch(). The problem arises if anything in
> the time from intel_batchbuffer_flush, until brw_finish_batch ends up calling
> itself. This can happen as a result of a call to
> intel_batchbuffer_begin()->intel_batchbuffer_require_space()->intel_batchbuffer_flush().
> 
> There are two possible cases today which can spawn this recursion.
> 1. There is a ring switch occurring (impossible, see below)
> 2. The ring is out of space (fairly unlikely)
> 
> brw_finish_batch() currently only has two cases where we end up calling
> intel_batchbuffer_begin().
> 1. Query objects pre hardware contexts:
>    Post hardware contexts, this is a NOP. Pre hardware contexts we did not have
>    a blitter ring. As mesa does not currently use the BSD ring (and probably
>    never will pre-GEN6), the only way to have a problem is a full ring (#2
>    above).
> 
> 2. brw_perf_monitor_finish_batch:
>    Here too it's only used pre GEN6 which means no ring switches are possible,
>    and so only a full ring can cause an issue.
> 
> Therefore only when we run out of ring space at batchbuffer flush do we have an
> issue.
> 
> I've implemented the cheap hack to fix this. It is totally not thread safe.  The
> proper way to fix this is with a lock as callers do not expect an early return
> before the flush has occurred. I'm not really certain if we can ever actually
> hit this path with mulththreaded situations, so cheap hack wins.
> 
> The same bug and reasoning (in fact it's more likely) may exist in the i915
> driver as well.
> 
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index db2f345..b0ebec7 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -209,6 +209,19 @@ brw_new_batch(struct brw_context *brw)
>  static void
>  brw_finish_batch(struct brw_context *brw)
>  {
> +   /* FIXME: not threadsafe */
> +   static int nest_depth = 0;
> +   assert (nest_depth >= 0);
> +
> +   if (nest_depth++) {
> +      /* It's actually possible to get here without multithreading (no ringspace
> +       * + ring switch). Oh well. */
> +      if (nest_depth >= 2)
> +         _mesa_warning(NULL, "Multhreaded batch flushing is out of order\n");
> +
> +      return;
> +   }
> +
>     /* Capture the closing pipeline statistics register values necessary to
>      * support query objects (in the non-hardware context world).
>      */
> @@ -223,6 +236,8 @@ brw_finish_batch(struct brw_context *brw)
>      * next batch.
>      */
>     brw->cache.bo_used_by_gpu = true;
> +
> +   nest_depth--;
> 

Whoops, forgot to commit.. this is nest_depth = 0