[Mesa-dev] [v3 4/6] i965/rbc: Consult rb settings for texture surface setup

Pohjolainen, Topi topi.pohjolainen at gmail.com
Fri Sep 9 18:00:10 UTC 2016


On Fri, Sep 09, 2016 at 07:11:16AM -0700, Jason Ekstrand wrote:
>    On Sep 8, 2016 11:13 PM, "Pohjolainen, Topi"
>    <[1]topi.pohjolainen at gmail.com> wrote:
>    >
>    > On Thu, Sep 08, 2016 at 08:49:56AM -0700, Jason Ekstrand wrote:
>    > >    On Sep 7, 2016 9:30 PM, "Pohjolainen, Topi"
>    > >    <[1][2]topi.pohjolainen at gmail.com> wrote:
>    > >    >
>    > >    > On Wed, Sep 07, 2016 at 03:25:30PM -0700, Jason Ekstrand
>    wrote:
>    > >    > >    On Sep 7, 2016 10:24 AM, "Topi Pohjolainen"
>    > >    > >    <[1][2][3]topi.pohjolainen at gmail.com> wrote:
>    > >    > >    >
>    > >    > >    > Once mcs buffer gets allocated without delay for
>    lossless
>    > >    > >    > compression (same as we do for msaa), one gets
>    regression in:
>    > >    > >    >
>    > >    > >    > GL45-CTS.texture_barrier_ARB.same-texel-rw
>    > >    > >    >
>    > >    > >    > Setting the auxiliary surface for both sampling engine
>    and
>    > >    data
>    > >    > >    > port seems to fix this. I haven't found any hardware
>    > >    documentation
>    > >    > >    > backing this though.
>    > >    > >    >
>    > >    > >    > v2 (Jason): Prepare also for the case where surface is
>    sampled
>    > >    with
>    > >    > >    >             non-compressible format forcing also
>    rendering
>    > >    without
>    > >    > >    >             compression.
>    > >    > >    > v3: Split asserts and decision making.
>    > >    > >    >
>    > >    > >    > Signed-off-by: Topi Pohjolainen
>    > >    <[2][3][4]topi.pohjolainen at intel.com>
>    > >    > >    > ---
>    > >    > >    >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 63
>    > >    > >    +++++++++++++++++++++---
>    > >    > >    >  1 file changed, 56 insertions(+), 7 deletions(-)
>    > >    > >    >
>    > >    > >    > diff --git
>    a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>    > >    > >    b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>    > >    > >    > index c1273c5..054c5c8 100644
>    > >    > >    > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>    > >    > >    > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>    > >    > >    > @@ -140,9 +140,7 @@ brw_emit_surface_state(struct
>    brw_context
>    > >    *brw,
>    > >    > >    >     struct isl_surf *aux_surf = NULL, aux_surf_s;
>    > >    > >    >     uint64_t aux_offset = 0;
>    > >    > >    >     enum isl_aux_usage aux_usage = ISL_AUX_USAGE_NONE;
>    > >    > >    > -   if (mt->mcs_mt &&
>    > >    > >    > -       ((view.usage &
>    ISL_SURF_USAGE_RENDER_TARGET_BIT) ||
>    > >    > >    > -        mt->fast_clear_state !=
>    > >    INTEL_FAST_CLEAR_STATE_RESOLVED)) {
>    > >    > >    > +   if (mt->mcs_mt && !(flags &
>    INTEL_AUX_BUFFER_DISABLED)) {
>    > >    > >    >        intel_miptree_get_aux_isl_surf(brw, mt,
>    &aux_surf_s,
>    > >    > >    &aux_usage);
>    > >    > >    >        aux_surf = &aux_surf_s;
>    > >    > >    >        assert(mt->mcs_mt->offset == 0);
>    > >    > >    > @@ -425,6 +423,54 @@ swizzle_to_scs(GLenum swizzle,
>    bool
>    > >    > >    need_green_to_blue)
>    > >    > >    >     return (need_green_to_blue && scs == HSW_SCS_GREEN)
>    ?
>    > >    > >    HSW_SCS_BLUE : scs;
>    > >    > >    >  }
>    > >    > >    >
>    > >    > >    > +static unsigned
>    > >    > >    > +brw_find_matching_rb(const struct gl_framebuffer *fb,
>    > >    > >    > +                     const struct intel_mipmap_tree
>    *mt)
>    > >    > >    > +{
>    > >    > >    > +   for (unsigned i = 0; i < fb->_NumColorDrawBuffers;
>    i++) {
>    > >    > >    > +      const struct intel_renderbuffer *irb =
>    > >    > >    > +         intel_renderbuffer(fb->_ColorDrawBuffers[i]);
>    > >    > >    > +
>    > >    > >    > +      if (irb->mt == mt)
>    > >    > >    > +         return i;
>    > >    > >    > +   }
>    > >    > >    > +
>    > >    > >    > +   return fb->_NumColorDrawBuffers;
>    > >    > >    > +}
>    > >    > >    > +
>    > >    > >    > +static bool
>    > >    > >    > +brw_disable_aux_surface(const struct brw_context *brw,
>    > >    > >    > +                        const struct intel_mipmap_tree
>    *mt)
>    > >    > >    > +{
>    > >    > >    > +   /* Nothing to disable. */
>    > >    > >    > +   if (!mt->mcs_mt)
>    > >    > >    > +      return false;
>    > >    > >    > +
>    > >    > >    > +   /* There are special cases only for lossless
>    compression.
>    > >    */
>    > >    > >    > +   if (!intel_miptree_is_lossless_compressed(brw, mt))
>    > >    > >    > +      return mt->fast_clear_state ==
>    > >    > >    INTEL_FAST_CLEAR_STATE_RESOLVED;
>    > >    > >    > +
>    > >    > >    > +   const struct gl_framebuffer *fb =
>    brw->ctx.DrawBuffer;
>    > >    > >    > +   const unsigned rb_index = brw_find_matching_rb(fb,
>    mt);
>    > >    > >    > +
>    > >    > >    > +   /* In practise it looks that setting the same
>    lossless
>    > >    compressed
>    > >    > >    surface
>    > >    > >    > +    * to be sampled without auxiliary surface and to
>    be
>    > >    written with
>    > >    > >    auxiliary
>    > >    > >    > +    * surface confuses the hardware. Therefore sampler
>    engine
>    > >    must
>    > >    > >    be provided
>    > >    > >    > +    * with auxiliary buffer regardless of the fast
>    clear
>    > >    state if
>    > >    > >    the same
>    > >    > >    > +    * surface is also going to be written during the
>    same
>    > >    rendering
>    > >    > >    pass with
>    > >    > >    > +    * auxiliary buffer enabled.
>    > >    > >    > +    */
>    > >    > >    > +   if (rb_index < fb->_NumColorDrawBuffers) {
>    > >    > >    > +      if (brw->draw_aux_buffer_disabled[rb_index]) {
>    > >    > >    > +         assert(mt->fast_clear_state ==
>    > >    > >    INTEL_FAST_CLEAR_STATE_RESOLVED);
>    > >    > >    > +      }
>    > >    > >    > +
>    > >    > >    > +      return brw->draw_aux_buffer_disabled[rb_index];
>    > >    > >
>    > >    > >    Can we use "mt->fast_clear_state ==
>    > >    INTEL_FAST_CLEAR_STATE_RESOLVED"
>    > >    > >    here as well?  For render targets we want to look at
>    > >    > >    aux_buffer_disabled but for textures (which is the only
>    case
>    > >    that uses
>    > >    > >    this function), fast_clear_state should be sufficient.
>    > >    >
>    > >    > Like the comment says, we can't skip setting aux buffer for
>    sampling
>    > >    in case
>    > >    > the same surface is written as render target with aux enabled.
>    > >    Hardware
>    > >    > doesn't seem to like this. This is specifically for that
>    purpose
>    > >    checking
>    > >    > the flag.
>    > >
>    > >    Let me see if I understand this correctly: this handles the case
>    where
>    > >    the miptree is bound as both a render target and a texture and
>    there is
>    > >    nothing about the texture view scenario causing it to skip
>    compression
>    > >    and, for some reason, the surface starts of the draw call fully
>    > >    resolved.  If *any* drawing has been done without a resolve, we
>    would
>    > >    be in the UNRESOLVED state and this wouldn't be needed.  That's
>    > >    terribly specific but also a valid use case.
>    > >
>    > >    We could potentially handle this by setting the render target
>    surfaces
>    > >    to UNRESOLVED a bit earlier in the pipeline setup code.  I
>    believe we
>    > >    flag things as unresolved rather late; in particular, after
>    we've done
>    > >    our texture state setup. The other option is to do exactly what
>    you did
>    > >    and add a comment.  That's probably the best option.  With a
>    good
>    > >    comment added,
>    >
>    > So, above there is already:
>    >
>    >     * In practise it looks that setting the same lossless compressed
>    surface
>    >     * to be sampled without auxiliary surface and to be written with
>    auxiliary
>    >     * surface confuses the hardware. Therefore sampler engine must be
>    provided
>    >     * with auxiliary buffer regardless of the fast clear state if the
>    same
>    >     * surface is also going to be written during the same rendering
>    pass with
>    >     * auxiliary buffer enabled.
>    >
>    > What would you add?
> 
>    Hmmm... I think my reservations about that convent are because it
>    sounds like a workaround.  How about this:
> 
>    If we are drawing into this with compression enabled, then we must also
>    enable compression when texturing from it regardless of
>    fast_clear_state.  If we don't then, after the first draw call with
>    this setup, there will be data in the CCS which won't get picked up by
>    subsequent texturing operations as required by ARB_texture_barrier.
>    Since we don't want to re-emit the binding table or do a resolve
>    operation every draw call, the easiest thing to do is just enable
>    compression on the texturing side.  This is completely safe to do
>    since, if compressed texturing weren't allowed, we would have disabled
>    compression of render targets in whatever_that_function_is_called().

This sounds really good, I'll make the change.

> 
>    > >
>    > >    Reviewed-by: Jason Ekstrand <[4][5]jason at jlekstrand.net>

Thanks!


More information about the mesa-dev mailing list