[Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

Tue May 8 17:42:02 UTC 2018

Kenneth Graunke <kenneth at whitecape.org> writes:

> We'd like to start using soft-pin to assign BO addresses up front, and
> never move them again.  Our previous plan for dealing with 48-bit VF
> cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
> have addresses that alias in the low 32 bits.  But that requires moving
> buffers dynamically.
>
> This patch tracks the last seen BO address for each vertex/index buffer,
> and emits a VF cache invalidate if the high bits change.  (Ideally, we
> won't hit this case very often.)  This should work for the soft-pin
> case, but unfortunately won't work in the relocation case, as we don't
> actually know the addresses.  So, we have to use both methods.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h       |  6 ++
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 62 +++++++++++++++++++
>  2 files changed, 68 insertions(+)
>
> Migration is nice at times, but keeping everything static is also really
> nice for pre-baking states...hard to know exactly what to do here.  It
> would be nice if we could allocate all GL buffer objects that might be
> VBOs out of the same 4GB, but that's...difficult to know, and might be
> too limiting.  *shrug*
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index fb637e22281..7d6aa1a9c51 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -983,6 +983,9 @@ struct brw_context
>  
>        /* For the initial pushdown, keep the list of vbo inputs. */
>        struct vbo_inputs draw_arrays;
> +
> +      /* High bits of the last seen vertex buffer address (for workarounds). */
> +      uint16_t last_bo_high_bits[33];

Maybe it's time to add a #define for 33 being the maximum number of
vertex buffers.

>     } vb;
>  
>     struct {
> @@ -1003,6 +1006,9 @@ struct brw_context
>         * referencing the same index buffer.
>         */
>        unsigned int start_vertex_offset;
> +
> +      /* High bits of the last seen index buffer address (for workarounds). */
> +      uint16_t last_bo_high_bits;
>     } ib;
>  
>     /* Active vertex program:
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c
> index b1867c1a1cc..e517b91de93 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -480,6 +480,64 @@ upload_format_size(uint32_t upload_format)
>     }
>  }
>  
> +static UNUSED uint16_t
> +pinned_bo_high_bits(struct brw_bo *bo)
> +{
> +   return (bo->kflags & EXEC_OBJECT_PINNED) ? bo->gtt_offset >> 32ull : 0;
> +}
> +
> +/* The VF cache designers apparently cut corners, and made the cache
> + * only consider the bottom 32 bits of memory addresses.

You mentioned it in the thread here, but I think it's important to
mention in the comment too that the cache is considering the tuple of
the bottom 32 bits and the vertex buffer's index in the state
message. Otherwise it would look like an individual set of vertices
could be self-conflicting between indices.

with those,

Reviewed-by: Scott D Phillips <scott.d.phillips at intel.com>