[Mesa-dev] [PATCH 4/4] i965: Reimplement all the PIPE_CONTROL rules.

Pohjolainen, Topi topi.pohjolainen at gmail.com
Mon Feb 25 19:01:33 UTC 2019


On Mon, Feb 25, 2019 at 10:32:27AM -0800, Kenneth Graunke wrote:
> On Monday, February 25, 2019 6:33:11 AM PST Pohjolainen, Topi wrote:
> > On Thu, Nov 01, 2018 at 08:04:21PM -0700, Kenneth Graunke wrote:
> > > This implements virtually all documented PIPE_CONTROL restrictions
> > > in a centralized helper.  You now simply ask for the operations you
> > > want, and the pipe control "brain" will figure out exactly what pipe
> > > controls to emit to make that happen without tanking your system.
> > > 
> > > The hope is that this will fix some intermittent flushing issues as
> > > well as GPU hangs.  However, it also has a high risk of causing GPU
> > > hangs and other regressions, as this is a particularly sensitive
> > > area and poking the bear isn't always advisable.
> > 
> > First I checked I could find all the things in bspec. There was one that I
> > couldn't, noted further down.
> > 
> > Second I checked that all the rules earlier were implemented. Found one
> > exception, noted further down as well.
> > 
> > I didn't check if the rules still miss something in bspec. That would be
> > another exercise...
> 
> [snip]
> > > +   /* Recursive PIPE_CONTROL workarounds --------------------------------
> > > +    * (http://knowyourmeme.com/memes/xzibit-yo-dawg)
> > > +    *
> > > +    * We do these first because we want to look at the original operation,
> > > +    * rather than any workarounds we set.
> > > +    */
> > > +   if (GEN_GEN == 6 && (flags & PIPE_CONTROL_RENDER_TARGET_FLUSH)) {
> > > +      /* Hardware workaround: SNB B-Spec says:
> > > +       *
> > > +       *    "[Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush
> > > +       *     Enable = 1, a PIPE_CONTROL with any non-zero post-sync-op is
> > > +       *     required."
> > > +       */
> > > +      brw_emit_post_sync_nonzero_flush(brw);
> > > +   }
> > > +
> > > +   if (GEN_GEN == 9 && (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE)) {
> > > +      /* The PIPE_CONTROL "VF Cache Invalidation Enable" bit description
> > > +       * lists several workarounds:
> > > +       *
> > > +       *    "Project: SKL, KBL, BXT
> > > +       *
> > > +       *     If the VF Cache Invalidation Enable is set to a 1 in a
> > > +       *     PIPE_CONTROL, a separate Null PIPE_CONTROL, all bitfields
> > > +       *     sets to 0, with the VF Cache Invalidation Enable set to 0
> > > +       *     needs to be sent prior to the PIPE_CONTROL with VF Cache
> > > +       *     Invalidation Enable set to a 1."
> > > +       */
> > > +      genX(emit_raw_pipe_control)(brw, 0, NULL, 0, 0);
> > > +   }
> > > +
> > > +   if (GEN_GEN == 9 && IS_COMPUTE_PIPELINE(brw) && post_sync_flags) {
> > > +      /* Project: SKL / Argument: LRI Post Sync Operation [23]
> > > +       *
> > > +       * "PIPECONTROL command with “Command Streamer Stall Enable” must be
> > > +       *  programmed prior to programming a PIPECONTROL command with "LRI
> > > +       *  Post Sync Operation" in GPGPU mode of operation (i.e when
> > > +       *  PIPELINE_SELECT command is set to GPGPU mode of operation)."
> > > +       *
> > > +       * The same text exists a few rows below for Post Sync Op.
> > > +       */
> > > +      genX(emit_raw_pipe_control)(brw, PIPE_CONTROL_CS_STALL, bo, offset, imm);
> > 
> > Are bo, offset, imm needed here as well?
> 
> No, I don't think so.  We should pass NULL, 0, 0 here - we're just doing
> a plain CS stall separately.  We'll use them for the actual write later.
> 
> Good catch!
> 
> [snip]
> > > -         if (GEN_GEN >= 9) {
> > > -            /* THE PIPE_CONTROL "VF Cache Invalidation Enable" docs continue:
> > > -             *
> > > -             *    "Project: BDW+
> > > -             *
> > > -             *     When VF Cache Invalidate is set “Post Sync Operation” must
> > > -             *     be enabled to “Write Immediate Data” or “Write PS Depth
> > > -             *     Count” or “Write Timestamp”."
> > > -             *
> > > -             * If there's a BO, we're already doing some kind of write.
> > > -             * If not, add a write to the workaround BO.
> > > -             *
> > > -             * XXX: This causes GPU hangs on Broadwell, so restrict it to
> > > -             *      Gen9+ for now...see this bug for more information:
> > > -             *      https://bugs.freedesktop.org/show_bug.cgi?id=103787
> > 
> > In "Flush Types" workarounds later on you apply this for gen8 as well.
> 
> Yes, that's intentional - we're supposed to according to the docs.
> I re-tested the Piglit test from bug 103787 on my Broadwell, and it
> works fine - no GPU hangs.  I think we were just doing it wrong before.
> 
> Trying to figure out an ordering for the workarounds is awful... :(

What would you think about another patch just before this to enable that for
gen8? Just in case it causes problems it would bisect to much smaller patch.

> 
> > > -             */
> > > -            if (!bo) {
> > > -               flags |= PIPE_CONTROL_WRITE_IMMEDIATE;
> > > -               bo = brw->workaround_bo;
> > > -            }
> > > -         }
> > > +   if (GEN_IS_HASWELL) {
> > > +      /* From the PIPE_CONTROL page itself:
> > > +       *
> > > +       *    "HSW - Programming Note: PIPECONTROL with RO Cache Invalidation:
> > > +       *     Prior to programming a PIPECONTROL command with any of the RO
> > > +       *     cache invalidation bit set, program a PIPECONTROL flush command
> > > +       *     with “CS stall” bit and “HDC Flush” bit set."
> > > +       *
> > > +       * TODO: Actually implement this.  What's an HDC Flush?
> > 
> > There is bit 9 HDC Flush but that is defined for ICL, for HSW I couldn't find
> > anything either.
> 
> Yeah...I figured I'd call it out, but...I don't think we're doing
> anything today either, so this is at least no worse.
> 
> > > +       */
> > > +   }
> > > +
> > > +   if (flags & PIPE_CONTROL_FLUSH_LLC) {
> > > +      /* From the PIPE_CONTROL instruction table, bit 26 (Flush LLC):
> > > +       *
> > > +       *    "Project: ALL
> > > +       *     SW must always program Post-Sync Operation to "Write Immediate
> > > +       *     Data" when Flush LLC is set."
> > > +       *
> > > +       * For now, we just require the caller to do it.
> > > +       */
> > > +      assert(flags & PIPE_CONTROL_WRITE_IMMEDIATE);
> > > +   }
> > > +
> > > +   /* "Post-Sync Operation" workarounds -------------------------------- */
> > > +
> > > +   /* Project: All / Argument: Global Snapshot Count Reset [19]
> > > +    *
> > > +    * "This bit must not be exercised on any product.
> > > +    *  Requires stall bit ([20] of DW1) set."
> > > +    *
> > > +    * We don't use this, so we just assert that it isn't used.  The
> > > +    * PIPE_CONTROL instruction page indicates that they intended this
> > > +    * as a debug feature and don't think it is useful in production,
> > > +    * but it may actually be usable, should we ever want to.
> > > +    */
> > > +   assert((flags & PIPE_CONTROL_GLOBAL_SNAPSHOT_COUNT_RESET) == 0);
> > > +
> > > +   if (flags & (PIPE_CONTROL_MEDIA_STATE_CLEAR |
> > > +                PIPE_CONTROL_INDIRECT_STATE_POINTERS_DISABLE)) {
> > > +      /* Project: All / Arguments:
> > > +       *
> > > +       * - Generic Media State Clear [16]
> > > +       * - Indirect State Pointers Disable [16]
> > > +       *
> > > +       *    "Requires stall bit ([20] of DW1) set."
> > > +       *
> > > +       * Also, the PIPE_CONTROL instruction table, bit 16 (Generic Media
> > > +       * State Clear) says:
> > > +       *
> > > +       *    "PIPECONTROL command with “Command Streamer Stall Enable” must be
> > > +       *     programmed prior to programming a PIPECONTROL command with "Media
> > > +       *     State Clear" set in GPGPU mode of operation"
> > > +       *
> > > +       * This is a subset of the earlier rule, so there's nothing to do.
> > > +       */
> > > +      flags |= PIPE_CONTROL_CS_STALL;
> > > +   }
> > > +
> > > +   if (flags & PIPE_CONTROL_STORE_DATA_INDEX) {
> > > +      /* Project: All / Argument: Store Data Index
> > > +       *
> > > +       * "Post-Sync Operation ([15:14] of DW1) must be set to something other
> > > +       *  than '0'."
> > > +       *
> > > +       * For now, we just assert that the caller does this.  We might want to
> > > +       * automatically add a write to the workaround BO...
> > > +       */
> > > +      assert(non_lri_post_sync_flags != 0);
> > > +   }
> > > +
> > > +   if (flags & PIPE_CONTROL_SYNC_GFDT) {
> > > +      /* Project: All / Argument: Sync GFDT
> > > +       *
> > > +       * "Post-Sync Operation ([15:14] of DW1) must be set to something other
> > > +       *  than '0' or 0x2520[13] must be set."
> > > +       *
> > > +       * For now, we just assert that the caller does this.
> > > +       */
> > > +      assert(non_lri_post_sync_flags != 0);
> > > +   }
> > > +
> > > +   if (IS_GENx10_BETWEEN(60, 75) && (flags & PIPE_CONTROL_TLB_INVALIDATE)) {
> > > +      /* Project: SNB, IVB, HSW / Argument: TLB inv
> > > +       *
> > > +       * "{All SKUs}{All Steppings}: Post-Sync Operation ([15:14] of DW1)
> > > +       *  must be set to something other than '0'."
> > > +       *
> > > +       * For now, we just assert that the caller does this.
> > > +       */
> > > +      assert(non_lri_post_sync_flags != 0);
> > > +   }
> > > +
> > > +   if (GEN_GEN >= 7 && (flags & PIPE_CONTROL_TLB_INVALIDATE)) {
> > > +      /* Project: IVB+ / Argument: TLB inv
> > > +       *
> > > +       *    "Requires stall bit ([20] of DW1) set."
> > > +       *
> > > +       * Also, from the PIPE_CONTROL instruction table:
> > > +       *
> > > +       *    "Project: SKL+
> > > +       *     Post Sync Operation or CS stall must be set to ensure a TLB
> > > +       *     invalidation occurs.  Otherwise no cycle will occur to the TLB
> > > +       *     cache to invalidate."
> > > +       *
> > > +       * This is not a subset of the earlier rule, so there's nothing to do.
> > > +       */
> > > +      flags |= PIPE_CONTROL_CS_STALL;
> > > +   }
> > > +
> > > +   if (GEN_GEN == 9 && devinfo->gt == 4) {
> > > +      /* TODO: The big Skylake GT4 post sync op workaround */
> > > +   }
> > > +
> > > +   /* "GPGPU specific workarounds" (both post-sync and flush) ------------ */
> > > +
> > > +   if (IS_COMPUTE_PIPELINE(brw)) {
> > > +      if (GEN_GEN >= 9 && (flags & PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE)) {
> > > +         /* Project: SKL+ / Argument: Tex Invalidate
> > > +          * "Requires stall bit ([20] of DW) set for all GPGPU Workloads."
> > > +          */
> > > +         flags |= PIPE_CONTROL_CS_STALL;
> > >        }
> > >  
> > > -      if (GEN_GEN == 10)
> > > -         gen10_add_rcpfe_workaround_bits(&flags);
> > > -   } else if (GEN_GEN >= 6) {
> > > -      if (GEN_GEN == 6 &&
> > > -          (flags & PIPE_CONTROL_RENDER_TARGET_FLUSH)) {
> > > -         /* Hardware workaround: SNB B-Spec says:
> > > +      if (GEN_GEN == 8 && (post_sync_flags ||
> > > +                           (flags & (PIPE_CONTROL_NOTIFY_ENABLE |
> > > +                                     PIPE_CONTROL_DEPTH_STALL |
> > > +                                     PIPE_CONTROL_RENDER_TARGET_FLUSH |
> > > +                                     PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> > > +                                     PIPE_CONTROL_DATA_CACHE_FLUSH)))) {
> > > +         /* Project: BDW / Arguments:
> > >            *
> > > -          *   [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush
> > > -          *   Enable = 1, a PIPE_CONTROL with any non-zero post-sync-op is
> > > -          *   required.
> > > +          * - LRI Post Sync Operation   [23]
> > > +          * - Post Sync Op              [15:14]
> > > +          * - Notify En                 [8]
> > > +          * - Depth Stall               [13]
> > > +          * - Render Target Cache Flush [12]
> > > +          * - Depth Cache Flush         [0]
> > > +          * - DC Flush Enable           [5]
> > > +          *
> > > +          *    "Requires stall bit ([20] of DW) set for all GPGPU and Media
> > > +          *     Workloads."
> > 
> > This I couldn't find.
> 
> Ah, sorry, this could probably be clearer.
> 
> On the "Programming Restrictions for PIPE_CONTROL" subpages ("Post-Sync
> Operation", "Flush Types"), there are separate table rows for each bit,
> with the same workaround text:
> 
>  +--------------------------------------------------------------------+
>  | LRI Post  | 23 | Requires stall bit ([20] of DW) set for all | BDW |
>  | Sync      |    | GPGPU and Media Workloads.                  |     |
>  | Operation |    |                                             |     |
>  +--------------------------------------------------------------------+
> 
> I just grouped them together because they're all the same workaround...
> just documented in 8 different table rows.
> 
> I'm definitely open to suggestions on how to improve the comment :)

Ah, now I got it. Makes sense. Perhaps just that it combines the rules for
BDW from the two tables.

> 
> Thanks so much for looking at this!  I owe you!

No problem, writing this patch in the first place must have not been that
easy :)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>


More information about the mesa-dev mailing list